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LARGE DEVIATIONS FOR A CLASS OF NONHOMOGENEOUS 

MARKOV CHAINS 1 

By Zach Dietz and Sunder Sethuraman 
Tulane University and Iowa State University 

Large deviation results are given for a class of perturbed non- 
homogeneous Markov chains on finite state space which formally in- 
cludes some stochastic optimization algorithms. Specifically, let {Pn} 
be a sequence of transition matrices on a finite state space which con- 
verge to a limit transition matrix P. Let be the associated non- 
homogeneous Markov chain where P n controls movement from time 
n — 1 to n. The main statements are a large deviation principle and 
bounds for additive functionals of the nonhomogeneous process under 
some regularity conditions. In particular, when P is reducible, three 
regimes that depend on the decay of certain "connection" P n prob- 
abilities are identified. Roughly, if the decay is too slow, too fast or 
in an intermediate range, the large deviation behavior is trivial, the 
same as the time-homogeneous chain run with P or nontrivial and 
involving the decay rates. Examples of anomalous behaviors are also 
given when the approach P n — > P is irregular. Results in the interme- 
diate regime apply to geometrically fast running optimizations, and 
to some issues in glassy physics. 



1. Introduction. The purpose of this paper is to provide some large de- 
viation bounds and principles for a class of nonhomogeneous Markov chains 
related to some popular stochastic optimization algorithms such as Metropo- 
lis and simulated annealing schemes. In a broad sense, these algorithms are 
stochastic perturbations of steepest descent or "greedy" procedures to find 
the global minimum of a function H and are in the form of nonhomogeneous 
Markov chains whose connecting transition kernels converge to a limit kernel 
associated with steepest descent. 
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For instance, in the Metropolis algorithm on finite state space S, the 
transition kernel connecting times n — 1 and n is given by 



where g is an irreducible transition function and (5 n represents an inverse 
temperature parameter which diverges, (3 n — > co. Here, the limit kernel P = lirn 
corresponds to steepest descent in that jumps from i to j when H(j) > H(i) 
are not allowed. 

These types of schemes are intensively used in image analysis [35], neural 
networks [4], statistical physics of glassy systems and combinatorial opti- 
mizations [26]. More general tutorials include [8, 16, 17] and [32]. 

Virtually all previous large deviations work with respect to optimization 
chains has been through Freidlin-Wentzell-type methods [14]. This approach 
is to consider a sequence of time-homogeneous Markov chains, parametrized 
by temperature, which approaches the steepest descent chain as the temper- 
ature cools, and then to transfer "short time" large deviation estimates to 
a single related system in which temperature varies with time. For instance, 
with respect to the Metropolis algorithm, by studying the sequence of time- 
homogeneous chains {X? : [3 > 0} where (5 n = (3 and f oo, estimates can be 
made on the nonhomogeneous chain where j3 n varies. Although this approach 
has had much success, especially related to statistical physics metastability 
questions, it seems that only large deviation bounds are recovered for the 
position of the nonhomogeneous process rather than large deviation prin- 
ciples (LDPs) (see [2, 5, 6, 8, 9] and references therein). It would be then 
natural to ask about LDPs for empirical averages which are more regular 
objects than the positions. 

In a different, more general vein, LDPs have been shown for indepen- 
dent nonidentically distributed variables whose Cesaro empirical averages 
converge [29], and also for some types of Gibbs measures, which include 
nonhomogeneous chains whose connecting transition kernels are positive en- 
trywise and converge in Cesaro mean to a positive limit matrix [31]. 

Other work in the literature treats an intermediate case of nonhomogene- 
ity, namely Markov chains whose transition kernels are chosen at random 
from a time-homogeneous process. The results here are then to prove an 
LDP for almost all realized nonhomogeneous Markov chains chosen in this 
fashion [20, 30]. Also, we note that an LDP has been shown for a class of near 
irreducible time-homogeneous processes that satisfy some mixing conditions 



In this context, we develop here an LDP in natural scale n with explicit 
rate function for the empirical averages of nonhomogeneous Markov chains 




for j 
for j = i 
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on finite state spaces whose transition kernels converge to the general limit 
matrix which allows for reducibility, a key concern in optimization schemes. 
We note the methods used here differ from Preidlin-Wentzell-type arguments 
in that they focus on the nonhomogeneous process itself rather than homo- 
geneous approximations. The specific techniques used are constructive and 
involve various "surgeries" of path realizations and some coarse graining. 

Let E = {1, 2, . . . ,t} be a finite set of points. Let P n = {p n {hj) £ E} be 
a sequence of r x r stochastic matrices for n > 1 and let ir be a distribution 
on E. Let now ¥ w = F\ be the (nonhomogeneous) Markov measure on 
the sequence space E°° with Borel sets i3(E°°) that correspond to initial 
distribution ir and transition kernels {P n }- That is, with respect to the 
coordinate process Xq , X± , . . . , we have the Markov property 

F n (X n+1 =j\X ,X 1 ,.. . ,X n _i,X n = i) =p n+ i(i,j) 

for all i,j £ S and n > 0. We see then that P n +i controls "transitions" 
between times n and n + 1. 

We now specify the class of nonhomogeneous processes focused on in this 
article. Let ir be a distribution and let P = {p(i,j)} be a stochastic matrix 
on S. Define the collection 

A(P) = {¥i P ^:P n ^P}, 

where the convergence P n — > P is elementwise, that is, lim n __ >oc p n (z, j) = 
p(i,j) for all i,j £ E. The collection A can be thought of as perturbations 
of the time-homogeneous Markov chain run with P and is a natural class 
in which to explore how nonhomogeneity enters into the large deviation 
picture. 

We also remark that this class has been studied in connection with other 
types of problems such as ergodicity [19], laws of large numbers [34, 35] and 
fluctuations [18]. See also [24] and [15] for some laws of large numbers and 
fluctuation results for generalized annealing algorithms and Markov chains 
with rare transitions. 

Let now / : E — > M. d be a (d > l)-dimensional function. Let also G A(P) 
be a P-perturbed nonhomogeneous Markov measure. In terms of the coor- 
dinate process, define the additive sum Z n = Z n (f) for n > 1 by 

1 n 

z n = -^2f(x l ). 

i=i 

The specific goal of this paper is to understand the large deviation behav- 
ior of the induced distributions of {Z n : n > 1} with respect to P,r in scale n. 
That is, we search for a rate function J so that for Borel sets B C M. d , 

- inf Mz) < liminf-logPjZ^ e B) 

z£B° n 

< limsup - logP 7r (Z n eB)<- inf JT(A 
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An immediate question which comes to mind is whether these large devia- 
tions for the nonhomogeneous chain, if they exist, differ from the deviations 
with respect to the time-homogeneous chain run with P. The general answer 
found in our work is "Yes" and "No," and as might be suspected depends 
on the rate of convergence P n — > P and the structure of the limit matrix P. 

More specifically, when P is irreducible, it turns out that the large de- 
viation of behavior of {Z n } under ¥ n is the same as that under the time- 
homogeneous chain associated with P and independent of the rate of con- 
vergence of P n to P. (Note that [31] covers the case P is positive entrywise 
and [29] covers the case when each P n has identical rows.) 

Perhaps the more interesting case is when the target matrix P is reducible. 
Indeed, this is the case with stochastic optimization algorithms where H has 
several local minima, for example, with respect to the Metropolis process, 
the local minima sets of H do not communicate in the limit steepest descent 
chain. In this situation, the large deviations of {Z n } depend on both the 
type of reducibilities of P and the decay rate, with respect to P n , of certain 
"connection probabilities" between P-irreducible sets, and fall into three 
categories. Namely, when the decay is fast, or superexponential, the large 
deviation behavior is the same as for the time-homogeneous Markov chain 
run under P; when the speed is slow, or subexponential, a trivial large devi- 
ation behavior is obtained; finally, when the speed is intermediate, or when 
the connection probabilities are on the order e~ Cn , a nontrivial behavior is 
found which differs from stationarity. 

We remark now, in terms of applications, the intermediate processes are 
important in situations such as (i) fast annealing simulations, and (ii) models 
of glass formation. 

(i) In Metropolis-type procedures, classic convergence theorems mandate 
that the temperatures satisfy (3 n < Ch logn with respect to a known constant 
Ch for the process to converge to the global minima set of H (cf. [8] and [17]): 

lim ¥ n (X n € global minima set of H) = 1. 

n— >oo 

However, with only finite time and resources, the optimal logarithmic speed 
is too slow to yield good results. In fact, in violation of classic results, expo- 
nentially fast schemes where (3 n ~ n are often used for which the process may 
actually converge to a nonglobal but local minimum of H. Whereas connect- 
ing probabilities between local minima sets are on the order of exp(— Cj3 n ), 
these chains fit naturally in the intermediate framework mentioned above (cf. 
discussion after Corollary 3.1). Although there are some good error bounds 
for these geometrically cooling experiments in finite time [7], it seems the 
structure of the associated dynamics is not that well understood (cf. [35], 
Section 6.2). 

(ii) In the manufacture of glass, a hot, fired material is quickly quenched 
into a substance which is not quite solid or liquid. The interpretation is that 
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under rapid cooling the constructed glass is caught in a local energy optimum 
associated with some spatial disorder — not the regularly structured global 
one associated with a solid — from which over much longer time scales it may 
move to other states [22] . Such glassy systems are intensively studied in the 
literature. Two rough concerns can be identified: What are the typical glass 
landscapes which specify the local optima and what are the dynamics of the 
quick quenching phase and beyond? Much discussion is focused on the first 
concern [27], but even in systems where statics are quantified, dynamical 
questions remain open [23], Part IV, and [26]. However, with respect to 
metastability, as mentioned earlier, much work has been accomplished (cf. 
[3, 11] and [33] and references therein). Less work has been done though when 
certain time inhomogeneities are severe, say on exponential scale e~ Cn , in 
the context of Metropolis models in the intermediate regime. 

At this point, we observe, as alluded to above in the two examples, that 
(from Borel-Cantelli arguments) the typical large scale picture of general 
intermediate speed nonhomogeneous Markov chains is to get trapped in one 
of the irreducible sets that correspond to the limit P (e.g., the local H- 
minima sets in the Metropolis scheme). In this sense, the large deviations 
rate function J, found with respect to averages {Z n }, is relevant to under- 
standing how atypical deviations arise, namely how the process average can 
"survive" for long times, that is, how Z n ~ z for large n when z is not a 
P-irreducible set average. More specifically, when P corresponds to K > 2 
irreducible sets {C^.}, we show that J is an optimization between two types 
of costs and is in the form 

k-i / i \ K 

' ' i=l \j=l / i=l 

Here I^-. is the rate function for the P time-homogeneous chain restricted 
to Cq and represents a "resting" cost of moving within Cq, and U(Cj,(k) 
is a large deviation "routing" cost of traveling between Cq and C^ k . Also, 
8 and Q are the sets of permutations and probabilities on {1,2, ... ,K}, 
respectively, and D(y,z) is the set of vectors x such that J2j=i v j x j = z - 
The intuition then is that Z n optimally deviates to z by visiting sets {Cq} 
finitely many times, in a certain order a with time proportions v, so that 
the average z is maintained, and resting and routing costs are minimized. 

Our main theorem (Theorem 3.3) is that under some natural regularity 
conditions on the approach P n — ► P, the average Z n satisfies an LDP with 
rate function J. When IA = — oo or = 0, that is, when connection probabil- 
ities vanish too fast or too slow, the rate J reduces to the rate function for 
the time-homogeneous chain run under P or a trivial rate. When the con- 
nections are exponential, —oo<U < and J nontrivially incorporates the 
convergence exponents (Corollary 3.1). Some comments on the Metropolis 
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algorithm are made at the end of Section 3. When the approach is irregular, 
large deviation bounds (Theorems 3.1 and 3.2) and examples (Section 12) 
of anomalous behaviors are also given. 

Finally, it is natural to ask about the large deviations on scales a n different 
from scale n, that is, the liminf and limsup limits of (l/a n ) logP 7r (Z n € B). 
The metaresult should be, if the typical system behavior is to be absorbed 
into certain sets, the analogous large deviation (LD) behavior holds in scale 
a n with revised resting and routing costs reflecting the scale. In fact, with re- 
spect to the Metropolis model, by the methods in this article, large deviation 
bounds and principles in scale (3 n can be derived as long as liminf [3 n /n e = oo 
for some 9 > 0. In principle, similar results should hold when j3 n > Clogn 
and C > 1, although this is not pursued here. On the other hand, large de- 
viation principles in scale (3 n < log n are of a completely different category, 
because in this case there is no local minima absorption (see, however, [9] 
for LD bounds with respect to metastability concerns). 

2. Preliminaries. We now recall and develop some definitions and nota- 
tion before arriving at the main theorems. Throughout, we use the conven- 
tion that ±oo -0 = and logO = — oo. 

2.1. Rate functions and extended LDP. Let I : M. d — > M. U {oo} be an ex- 
tended real- valued function. We say that I is an extended rate function if I is 
lower semicontinuous and, further, that I is a good extended rate function if, 
in addition, the level sets of I, namely {x : I(x) < a} for a€l, are compact. 
This definition extends the usual notion of rate function where negative val- 
ues are not allowed (cf. [10], Section 1.2). Namely, we say I is a (good) rate 
function if I:M d — > [0,oo] is a (good) extended rate function. 

We denote Qi C M. d as the domain of finiteness, Qj = {x € M. d : l(x) < oo}. 
We also recall the standard notation for B C M. d that 1(B) = mi x ^B^-( x )- 

Let now {/i n : n > 1} be a sequence of nonnegative measures with respect 
to Borel sets on M. d . We say that {/i n } satisfies a large deviation principle 
with (extended) rate function I if, for all Borel sets B C M d , we have 

1 1 
(2.1) — inf I(x) < liminf — log u n (B) < limsup — log U n (B) < — inf I(x). 
xcb° ' n n x£ b 

2.2. Nonnegative matrices. Let U = {u(i,j)} be a matrix on E and let 
C C E be a subset of states. Define Uc = {u(i,j) :i,j G C} as the corre- 
sponding submatrix. We say that Uc is nonnegative, denoted Uc > 0, if all 
entries are nonnegative. Analogously, Uc is positive, denoted Uc > 0, if its 
entries are all positive. We say a nonnegative matrix Uc is stochastic if all 
rows add to 1, X)jGC n (*'i) = 1 f° r an i^C; oi course, Uc is substochastic 
when J2j£c u (i'j) — 1 f° r an i £ C. Also, we say Uc is primitive if there 
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is an integer k > 1 such that U c > is positive. In addition, we say Uc is 
irreducible if, for any i,j E C, there is a finite path i = xo,x±, . . . , x n = j 
in C with positive weight, Uc(xo,x\) ■ ■ ■ Uc{x n -i,x n ) > 0. The period of a 
state i G C is defined as dc(i) = g.c.djn > 1 : Uc(i,i) > 0}. When CT^ is ir- 
reducible, all states in C have the same period dc- When dc = 1, we say 
L^c* is aperiodic. Finally, note that C/c* is primitive <£4> Uc is irreducible and 
aperiodic e> (Uc) x > 0. 

2.3. Construction CON. We now construct a sequence of nonnegative 
Markov-like measures. Let Uk = {uk(i,j)} for 1 < k < n be a sequence of 
r x r nonnegative matrices. Let also tt be a measure on E. Then define the 
nonnegative measure on E n for n > 1, where HL^-Xo € B) = tt(B) and 

n 

£o££x„£B 1=1 

where X n = (X±, . . . ,X n ) is the coordinate process up to time n. Let also 
= (Xi, . . . ,Xj) for < i < j be the observations between times i and j, 
and denote, for < k < m < Z, 

U {M) (X^G5)=U;(X^ fc G J B), 

where is made with respect to = for i > 1. When 7r is the point 
mass <5 X for x G E, we denote U^^) = U(k,x) f° r simplicity. 
The measure shares the Markov property: 

n 

(2.2) 

= E U 7r (X fe =x fe )U (Mfc) (X^ +1 GB). 

2.4. LDP /or homogeneous nonnegative processes. Let £7 be a nonnega- 
tive matrix on E. Let also C C E and let / : E — > M. d be a subset and function 
on the state space. 

For XeR d , define the "tilted" matrix Uc,x,f,U = n c,A by 

nc,A = Wu> W(j)) :M6C}. 

Suppose now that C is such that Uc is irreducible. Then lie, a is irreducible 
for all A and /, and we may define 

p(C, A) = p(C, A; /, U) as the Perron-Frobenius eigenvalue of Uc \ 

(2.3) 
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(cf. [10], Theorem 3.1.1, or [28]). Define also the extended function Iq = 

I Ci/i [/:M d ^IRU{oo} by 

^Cj,u{x) = sup{(A,x) -logp(C,A)} 

AeK d 

and let Qc = Qi c be its domain of finiteness. 

Let now ir be a distribution on E and let U,,- be made from CON with 
Uk = U for all k > 1. We call such a measure U,,- a homogeneous nonnegative 
process. Also, for xq G C, define the measures on M. d for n > 2 by 

A*n(S) = U X0 (Z„(/) e5,X„e C n ). 

Define also for 1 < A; < Z that Z£ = Zj[(/) = (1/Z- fc + 1) E'=fc /(^)- Note, 
as |S| < oo, that / is bounded, ||/|| = maxi<j<(j [|/i||x,°° < °o an d so Z^. varies 
within the closed cube K = B CU (Q, ||/||) of width 2||/|| about the origin. 

The following proposition is proved in the Appendix. 



Proposition 2.1. The function Ic and domain Qc satisfy the following 
criteria: 

1. Domain Qc is a nonempty convex compact subset of the cube K. 

2. Function Ic is a good extended rate function. In fact, when Uc is sub- 
stochastic, Ic is a good rate function. 

3. Function Ic is convex on BL d and strictly convex on the relative interior of 
Qc- Also, when restricted to Qc, Ic is uniformly continuous and hence 
bounded on Qc- 

4. Measure {/i n } satisfies an LDP (2.1) with extended rate function Ic- 



2.5. Upper block form. For a stochastic matrix P = {p(i,j)} on E, we 
now recall the upper block form. By reordering E if necessary, the matrix 
P may be put in the form 



(2.4) 



U(0,0) 








£7(0,1) 
8(1) 









17(0, Mo) 




S(M ) 



where 1 < Mq < x and 5(1), . . . , S(Mq) are stochastic irreducible submatrices 
that correspond to disjoint subsets of recurrent states — denoted as stochastic 
sets — and submatrices U(0, 0), . . . , £7(0, Mo) correspond to transient states 
when they exist. 
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When there are transient states, the square block [7(0, 0) itself may be 
decomposed as (cf. [28], Section 1.2) 



[7(0,0) 



R(l) V(l,2) V(1,N ) 

R(2) V(2,3) ••• V(2,N ) 

: : 



R(N ) 

where 1 < Nq < t — 1 and R(i) is either the lxl zero matrix or an irreducible 
submatrix that corresponds to a subset of transient states for 1 < i < Nq. We 
call the R(i) = [0] matrices and corresponding states degenerate transient, 
and the irreducible R(i) and associated states nondegenerate transient, since 
returns to these states are, respectively, impossible and possible under the 
time-homogeneous chain run with P. 

Define the number of degenerate transient submatrices as 



N ■ 



0. 



when no transient states in P, 
otherwise. 



\{l<i<N :R(i) = [0]}\, 
Also let the number of nondegenerate and stochastic submatrices be 



M 



/M , 
{(No 



N) + M , 



when no transient states in P, 
otherwise. 



It will be useful to rewrite the upper block form by inserting the form for 
U(0, 0) into (2.4). To this end, when there are transient states, let P(i) = 
R(i) for 1 < i < N and let P(i) = S(i - N ) for N + l<i<N + M . 
When all states are recurrent, let P(i) = S(i) for 1 < % < Mq. Also, in the 
following discussion, let T(i,j) for i < j denote the appropriate "connecting" 
submatrix U(-, ■) or V{-, •). We remark that T(i,j) is a matrix of zeroes for 
N + 1 < i < j < N + M. 

We have now the canonical decomposition 

'P(l) T(l,2) T(l,N + M)~ 

P(2) T(2,3) ••• T(2,A^ + M) 



P 















P(N + M) 

Let now Cj = Ci(P) C S be the subset which corresponds to P(i) so that 
P c . = P{i) = {p( x ,y):x,y 6 C{\ for 1 < i < N + M. Define also the sets 
V = V{P),N = M(P), M = M{P) and G = G(P) by 

D = {i : P(i) degenerate transient}, 

M = {i : P(i) nondegenerate transient}, 
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M = {i : P(z)stochastic}, 
Q = M U A4 (= {i : P(i) nondegenerate transient or stochastic}). 



To link with previous notation, note that N = \T>\ and M = \Q\. 

It will be convenient to enumerate the elements of Q as Q = {£i ,(2, - ■ ■ , Cm}- 
Whereas P{i) is (sub)stochastic and irreducible for i £ Q, we may denote, 
with respect to / : £ — > M d , the rate function Ij = Ic^/p and its domain of 
finiteness Qj = Qq- In addition, let 

(2.5) p min = min{p(x,y) :p{x,y) ^ 0, x, y G Q, i G 

be the minimum positive transition probability in the irreducible submatri- 
ces of P. 

Consider now a sequence of transition matrices {P n }, where P n = {p n (i,j)} 
converges to P. With respect to the sets {Cj(P) : 1 < i < N + M} above for 
the matrix P, the nth step matrix P n can be put in the form 

P„(l) T n (l,2) ■■■ ••• T n (l,JV + M)' 

T„(2,l) P„(2) T ?l (2,3) ••• T„(2,iV + M) 



Pn 



T„(3,2) 



.T n (JV + M,l) ■■■ ■■■ T n (N + M,N + M-l) P n (N + M) 

where P n (i) = {P n )c t P(«) for 1 < i < iV + M, T n (i, j) governs P n transi- 
tions from Cj to Cj, and T n (i,j) —>T(i,j) for z < j and vanishes otherwise. 
As a warning, we note that the form above for P n is NOT the canonical 
decomposition of P n . 

2.6. Routing costs and deviations. Let §m and Qm be the set of permu- 
tations and the collection of probability vectors on {1, 2, . . . , M }, 

n AI = ^vem M :Y^Vi = 1,0 <Vi<l for l<i< M j . 

For v G and 2: G R rf , define the set of convex combinations 

C M 
(2.6) D(M, v,z) = \ X =(x 1 ,...,x M )G (M d ) M 




Let also £/ = {u(i,j) : 1 < i, j < M} be a matrix of extended nonpositive real 
numbers. For a permutation a G §m> v G x £ (M rf ) M and z G define 
the extended functions 

M-l / t \ M 

C vt /(o-,x) = J ~ S ( H u i] n (C<7«>C ( x(i+i)) + II^ 1I C CT ( l )(^) 1 for M>2, 
Ii(xi), forM=l, 



LDP FOR NONHOMOGENEOUS MARKOV CHAINS 



11 



and 



Su(z) = inf inf min C v [/(ff,x). 

v€f2 M xSD(M,v,z) ctGSm ' 



It will be shown that 1S a good rate function (Proposition 4.1). Moreover, 
it will turn out, for well chosen routing cost matrices U, that Su(z) measures 
various upper and lower large deviation rates of the additive sums {Z n (f)}. 
Note that lu is defined in terms of {Q} = Q and depends on i € V only 
possibly through the routing cost U, which makes sense since it would be 
too expensive to rest on degenerate transient states in any positive time 
proportion. Also, we observe when M = 1, that is, when any transient states 
with respect to P do not allow returns, and P corresponds to exactly one 
irreducible stochastic block, the function Su(z) = I\(z) is independent of U. 

2.7. Upper and lower cost matrices. With respect to a P T £ we 
now specify certain relevant upper and lower costs U when N + M > 2. 
Define, for distinct 1 < i, j < N + M, 

(2.7) t(n,(i,j)) = maxp n (x,y) 

y^C 3 

and the extended nonpositive numbers 

v(i,j) = limsup — logt(n, (i, j)) and r(i,j) = liminf — logt(n, (i,j))- 

n^oo n n~*oo n 

Also, for < k < N + M — 2, let Iq = i, lk+i = j and let Lk = (lo,h, ■ ■ ■ ,lk,h+i) 
be a (k + 2)-tuple of distinct indices. Now define the upper cost 

k 

(2.8) Uo(i,j)= max max > v (L, / s +i) 
V ^ UV ,JJ 0<k<N+AI-2 L k 1 ' + ; 



and the lower cost 



A: 



Tn(i,j)= max maxV t(L, L+i). 
UV ,JJ 0<k<N+M-2 L k f^ Q v ' + J 

We remark briefly that Uo(i,j) and To(i,j) represent, respectively, maximal 
and minimal asymptotic travel costs of moving from Cj to Cj in k + 1 < 
N + M — 1 steps by visiting sets {Cj} in the order L^. 

A more subtle lower cost T\ is the following. Let 0<k<N + M — 2, 

= h h+i = J an d be as before. Let also 

1 < Qo, gfc+i < r and when k > 1 and 1 < s < k, 

(2.9) 

let 1 < q s < r + 1 
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and call Q k = (q ,..., q k+1 ). Let x° = (x?, . . . , x° qo ) and x fc+1 = (x^ +1 , . . . , x^+\) 
be vectors with components in Cj and Cj, respectively, and when k > 1, let 
x l = (x\, . . . ,x l q .) be a vector with elements in Q. for 1 < z < fc. Denote also 
the (k + 2)-tuple V k = (x^x 1 , . . . ,x fc+1 ). 

For distinct i,j&G, and y £ Cj and z £ Cj, define 

7 1 (n,u,z)= max max max max Pa,,! , a (X" +r ( fc+1 ) = (x°, . . . , x fc+1 , z)), 

where the concatenated vector (x°, . . . , x fe+1 , z) = (x^, . . . , ^q^\ , z) is of length 
at most E (N,M) + 1. Here, E (N,M) = (r + 1)(M - 2) + N + 2r and 
r(u) = ^ILo gj far < u < fc + 1. 
Also define 

(2-10) 7 \ n ,(i,j))= mi 7 \n,y,z). 

— y£Ci,z£Cj — 

Finally, define 

T\ (i, j) = !im inf - log 7 1 (n, (i, j) ) . 
n 

\\ > now interpret the objects 7 1 (n,y, 2), 7 1 (n, (i, j)) and T{(i,j). As with 
the routing cost To, -L^ is an ordered list of sets to visit on the way from 
point y to point z. More specifically here, Q k lists the O(r) number of 
steps taken in each visited set and V k details on which states this travel is 
made. Here, r is chosen since all movement in a given irreducible C, C S is 
possible in at most r = |S| steps. Then 7 1 (n, y, z) is the largest probability of 
movement from y to z within the constraints of O(r) travel among distinct 
sets. Also, 7 1 (n, is the smallest such chance of moving from Cj to Cj, 

and T%(i,j) is the asymptotic exponential rate of this quantity. 

3. Results. We now come to the main results for processes S A(P). 
After a general upper bound and some lower bounds which depend on natu- 
ral assumptions, we present an LDP which follows from these bounds. Some 
remarks on the Metropolis scheme and on the format of the article are made 
at the end of this section. 

The upper bound statement is the following. 

Theorem 3.1. With respect to good rate function Su an d Borel V C M. d , 
we have 

limsup — logP 7r (Z„ € T) < — inf Ju {z)- 
We now label conditions and assumptions to give LD lower bounds. 
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Sufficient initial ergodicity. To avoid degenerate cases, we introduce an 
initial ergodicity condition for ¥ n so that all information about P is relevant. 
A typical situation to avoid is when P n = P for n > m, and distribution 
ir P\ ■ ■ ■ P m locks the process evolution into a strict P- irreducible subset of E. 
To avoid lengthy technicalities and to be concrete, we impose the following 
assumption on the chains considered in this article. Let no = no({P n }) > 1 
be the first index m so that for all s,t £ Cj and i £ Q when p(s,t) > we 
have p n (s, t) > for n > m. Such an uq < oo exists since P n — > P. 

Condition SIE. There is an n\ > n — 1 such that 

F n (X ni £C l )>0 for allied. 
A simpler condition which implies Condition SIE is the following. 

Condition SIE-1. Let n = 1 and let ir(Ci) > for all i € Q. 

We say that a distribution tt is SIE-1 positive if vr(Cj) > for all i € Q. 
A trivial condition for SIE-1 positivity is when 7r is positive [e.g., when 
tt(x) > for all x G £]. 

Assumptions A, B and C. We now state three assumptions on the reg- 
ularity of the asymptotic approach P n — > P. 

Assumption A. Suppose v (i, j) = r(i,j) for all distinct 1 < i,j < N + 
M. 

Assumption B. Suppose for all distinct 1 < i, j < N + M there exists 
an element a = a(i,j) 6 Cj and a sequence {b n = b n (i,j)} C Cj such that 

r(i,j) = lim - logp n (a, b n ). 

n— >oo jt, 

In other words, r(i,j) is achieved on a fixed departing point a G C;. 

Assumption C. Define P*(i) = {p*(s,t) :s,t e Q} by 

)p(s,t), when p(s,t) > 0, 
1, when liminf (1/n) logp n (s, i) = and p(s, t) = 0, 

0, otherwise. 

Suppose that P*(i) is primitive for i EQ. 

In words, Assumption A specifies that the maximal connection proba- 
bilities in the (1/n) log sense have limits. Assumption B states that r(i,j) 
can be achieved in a systematic manner. Assumption C ensures there is 
"primitivity" in the system and covers the case when P is periodic but the 
approach P n is slow enough to give a sense of primitivity. We now list some 
easy sufficient conditions to verify these assumptions. 
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Proposition 3.1. 

LIM. Assumptions A and B hold if, for distinct 1 < i,j < N + M and 
each pair x GCi and y € Cj, we have lim n _ >00 (l/n) logp n (x, y) exists. 
PRM. Assumption C holds when {P(i) - i &G} are primitive. 

We now come to lower bound statements for the process that obeys Con- 
dition SIE, the first of which holds in general and the second of which holds 
under Assumption B or C. 

Theorem 3.2. Let P n satisfy Condition SIE. 

(i) Then with respect to good rate function Sti an d Borel V C M. d , we 
have 

- inf J T . (z) < liminf-logPjZn 6 r°). 

zer° n^oo n 

(ii) In addition, when either Assumption B or C holds, we have with 
respect to good rate function St that 

- inf St (z) <liminfilogP^(Z n eT°). 

zeT° n^oo n 

We note in the case M = 1 (i.e., when P possesses exactly one irreducible 
recurrent stochastic set and possibly some degenerate transient states) that 
Theorems 3.1 and 3.2 already give an LDP with rate function St± = Jw = 
In particular, in this case, the large deviation behavior under ¥ n is indepen- 
dent of the approach P n — > P. 

However, in the general situation when M > 2, the lower and upper 
bounds may be different. In fact, there are nonhomogeneous processes 
for which the lower and upper rate function bounds in Theorems 3.1 and 
3.2(i) differ and are achieved so that the result is sharp in a certain sense 
(e.g., the example in Section 12.2). 

Also, we remark that the two lower bounds in Theorem 3.2 may differ 
when there is some periodicity in the system and the maximal connection 
weight sequence is not regular. In this case, the process may not be allowed 
to visit freely various states because certain cyclic patterns may be in force. 
Therefore, the asymptotic routing costs in this general case should be larger 
than under Assumption B or C when some regularity is imposed on connec- 
tion probabilities or when a form of primitivity is present; hence, the use 
of T\ instead of Tq in the lower estimates. See Section 12.3 for an explicit 
process where lower bounds do not respect Tq. 

It is natural now to ask when the lower and upper bounds match in the 
previous results so that a large deviation principle holds. For z G let 

J{z) =SuJz). 
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Under Assumption A, costs Tq=IAq and so the following is a direct corollary 
of Theorems 3.1 and 3.2. 

Theorem 3.3. Suppose satisfies Condition SIE and Assumption A, 
and also either Assumption B or C. Then, with respect to good rate function 
J and Borel sets V C M. d , we have the LDP 

- inf Mz) < liminf-logP^Zn G T°) 

< lim sup - log ¥ n (Z n G T ) 

n— »oo Tl 

< - inf JJ(z). 

Hence, by Proposition 3.1, when all limits exist (LIM; in particular, e.g., 
in the time-homogeneous case, P n = P) or when Assumption A holds and 
there is no periodicity (PRM), the LDP is available. Also note that by taking 
f(x) = (li(x), 12(^)1 • • • j lr(^))) Theorem 3.3 gives the LDP for the empirical 
measure and so is a form of Sanov's theorem for these nonhomogeneous 
chains. 

We remark that it may be tempting to think Assumption A by itself 
may be sufficient for an LDP, but it turns out there are processes which 
satisfy Condition SIE and Assumption A but neither B nor C for which the 
LDP cannot hold (e.g., the example in Section 12.3). On the other hand, 
we note that Assumption A is not even necessary for an LDP, for instance, 
with respect to chains where P n alternates between two alternatives (cf. 
Section 12.1). So although Theorem 3.3 is broad in a sense, more work is 
required to identify necessary and sufficient conditions for an LDP. 

We now comment on the three types of LD behaviors mentioned in the 
Introduction which follow from Theorem 3.3. These are (1) homogeneous, 
(2) trivial and (3) intermediate behaviors for which easy sufficient (but not 
necessary) conditions are given below. 

Corollary 3.1. Let Condition SIE, and Assumption A and either As- 
sumption B or C hold. Let also N + M > 2. 

1. Suppose v(i,j) = —00 when limsupi(n, (i,j)) = for distinct 1 < i, j < 
N + M . Then J is also the rate function for the time-homogeneous chain 
run under P (because the routing costs are the same as if P n = P). 

2. Suppose \M.\ > 2 and Uo(i,j) = for all distinct i,j G M.. Then J van- 
ishes on the convex hull of\J is j^{z : Ii(z) = 0} and so is in a sense trivial. 

3. Suppose \A4\ > 2 and Uo(i,j) G (— 00, 0) for all distinct i,j G M. Then 
J differs from the rate function for the time-homogeneous chain run with 
P and also involves nontrivially the convergence speed of P n to P in 
terms of routing costs. 
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We now briefly comment on application to the Metropolis algorithm. Note 
l-X> re (U) =g(i,i) +X>(i,i)[l-exp(-/3 n (ff(i) -H(i)) + )]. 

Also, as n -> oo, we have ]im n ^ 00 g(i, j) exp{-/3 n (H(j)-H(i))+} = g(i,j)±[H(j)<H(i)] ■ 
Therefore, the limit matrix P is formed in terms of entries 

limp n (i,j) = l g(i,i)+^g(i,j)l [m>m] , ifi=j. 

We now decompose P into components T>, Af and AA. First, note that a 
state x G E belongs to the "level" set 

{n-1 
y : 3 path x = x , . . . , x n = y, where x i+1 ) > 

i=0 

and H{xi) = H(x) for 1 < i < nj, 

which corresponds to one of three types, T>, Af or A4. 

In particular, C x is a stochastic set that corresponds to A4 exactly when 
H(x) = min{H(y) :g(x,y) > 0} is a local minimum. Also, C x is a nondegen- 
erate transient set exactly when H{x) is not a local minimum and either 
g(x,x) > or g(x,y) > 0, where H{y) = H(x). Additionally, C x is a degen- 
erate singleton exactly when H{x) is not a local minimum, g(x,x) = 0, and 
when g(x,y) > we have H(y) ^ H{x). 

We now discuss the rate of convergence P n — > P. Observe for distinct 
1 < i,j < N + M, and x£d and y G Cj that 

limsup-logp^y) = ( ~ {H{y) ~ H(x)) + limsnp(p n /n), if g(x,y) > 0, 

with analogous expressions for liminf(l/n) logp n (x, y). Hence LIM holds 
when (3 = \\m(5 n /n exists. Also, we remark that when g(x,x) > for x G S, 
there are no degenerate transient states, so all P submatrices are primitive 
and PRM holds. In addition, given irreducibility of g, Condition SIE is 
satisfied with respect to any initial distribution tt. 

Therefore, by Corollary 3.1, as routing costs are computed with respect 
to different level sets, the three types of LD behavior follow when the limit 
(3 exists and there is more than one local minimum. Namely, trivial, inter- 
mediate or homogeneous behaviors occur when = 0, (3 G (0, oo) or (3 = oo. 

Finally, we give a concrete example with respect to a simple geometrically 
cooling Metropolis chain where (3 = 1. Let H be defined on E = {1, 2, . . . , 9} 
in terms of its graph (Figure 1) and let f(x) = H(x), so that Z n is the 
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3 4 5 6 

Fig. 1. Graph of H . 
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average H value seen by the chain. Typically, for large n, these values Z n 
will be near an i7-local minimum average. 

Let the kernel g be a random walk so that g(i,i + 1) = 1/2 for i = 
2, 6, 7, 8, g(i + l,i) = 1/2 for i = 1, 2, 6, 7, and g(l, 2) = 1, g(9, 8) = 1, #(3, 4) = 
1/2, 5 (4, 3) = (1 - a)/2, 5 (4, 4) = a, 5 (4, 5) = (1 - a)/2, 5 (5, 4) = (1 - b)/2, 
5(5,5) =b and 5(5,6) = (l-6)/2 with 0<a,6< 1. Then states {2}, {6}, {8} 
are distinct local minima, {4}, {5} are nondegenerate transient singletons 
and the remaining states are degenerate transient. 

The routing costs satisfy, for distinct sets, 



J2(H(l)-H(l + l)) + , fori< 



Uo({i},{j}) = { 



i=i 

i-1 



J2(H(l + l)-H(l)) + , ioxi>j. 

1=3 



Also, the rate functions that correspond to local minima 2,6 and 8 are 
degenerate, and equal 00 • 1hy2)(j/)) 00 ' ^H(6) an d 00 • 1™, respectively. 
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For the nondegenerate transient states 4 and 5, we have 

1 + a 



and 



-f{4}(2/) 



I {5}(.V) 



log- 



oo, 



log 



1 + 6 



oo, 



for y = H(4), 
otherwise, 



for y = H(5), 
otherwise. 



When — log(l + a)/2 = 1/3 and — log(l + 6)/2 = 2/3, we compute, by ana- 
lyzing the not-too-large number of possibilities, the nonconvex rate function 

for z < —1 and z > 3, 



J(z) 



oo, 

4z/9 + 4/9, 

-2z, 

z/6, 

5z/3-3, 
-5z/3 + 5, 



for -1 <z< -2/11, 
for -2/11 <z<0, 
for < z < 2, 
for 2<z<12/5, 
for 12/5 <z<3. 



Not surprisingly, J vanishes at local minima and is largest near z ~ 2 + 
(excluding infinite costs), with exact value z = 12/5 found from computation. 
The J calculation (see Figure 2) also gives optimal scenarios under which 
Z n ~ z; these include, for —1 < z < —2/11 that the average Z n is a convex 
combination of rest stays initially on {4} and then at {8}; for —2/11 < z < 0, 
at {8}, then {6}; for < z < 2, at {4}, then {6}; for 2 < z < 12/5, at {2}, 
then {4}; for 12/5 < z < 3, at {6}, then {2}. 

We now discuss the plan of the paper. In the next section, we outline the 
proof structure of the main theorems. After supplying proofs of stated results 
in the outline in Sections 5-11, we give the three examples in Section 12 
commented on earlier. Finally, in the Appendix some technical proofs are 
collected. 



4. Outline of the proofs of the main theorems. Consider a process € 
A(P) and a function / : E — > M. d . We first observe that J^ , St q and are 
all good rate functions from the following proposition, which is proved in 
the Appendix. 

Proposition 4.1. For a nonpositive cost U , the function Su is a good 
rate function and the domain of finiteness Qj Lr C K. 

In the following discussion, we say that the path X n enters or visits a 
subset CcS when Xi 6 C for some 1 < i < n. We now outline the proofs of 
Theorems 3.1 and 3.2. 
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4.1. Upper bounds: proof of Theorem 3.1. The proof follows by first a 
surgery of paths estimate, then a homogeneous rest cost comparison, a coarse 
graining cost estimate and finally a limit relationship on a perturbed rate 
function. Let r C R d be a Borel set. 

Surgery of paths estimate. The first step is to overestimate by another 
measure fJ-^ei^ which allows more movement in terms of parameters £2 > 
0. However, we restrict the process to those paths which make at most one 
long sojourn to each of the sets {Cj : 1 < i < N + M}, but connect among 
them in short visits. 

Before getting to the first bound, the following technical monotonicity 
lemma, proved in the Appendix, is needed. 

Lemma 4.1. Let 5 € [0, 1] and let {t n } C [0, 1] be a sequence which con- 
verges to 5. Then there exists a sequence {t n } C (0, 1] such that (i) t n < t n , 
(ii) t n [5 monotonically and (iii) the limit lim(l/n) \ogt n exists and equals 

1 1 

hm — log t n = hm sup — log t n . 

n~*oo n n^oo 71 



1.5r 




0.5 



-2-1 12 3 4 

Fig. 2. Graph of J. 
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Recall now the definition of t(n, [cf. (2.7)] and let 

{t(n, be the sequence made from {t(n, and Lemma 4.1. 

Also, for distinct 1 < i, j < N + M, as in the definition of Uo(i,j) [cf. (2.8)], 
let < k < N + M — 2, let Iq = i and lk+\ = j, and let L\. = {lo, l±, . . . , IkJk+i) 
be composed of distinct indices. Then define 

k 

r y(n,(i,i)) = max max TT t(n + s, (L, L+i)). 

n ' V ' J ^ 0<fe<JV+M-2 L fe 1J- V ,K + " 

The term j(n,(i,j)) bounds the largest possible transition probability be- 
tween sets Cj and Cj in at most N + M — 1 steps. 

We now create a certain sequence of positive transition matrices. For 
general P and approaching sequence {P n }, the submatrices P(i) and P n (i) 
for 1 < i < N + M need not be positive. It will be helpful, however, to ma- 
jorize them as follows. Let e > 0, and let P(i,e) = {p(s,t;e) :s,tE Ci} and 
P n (i,e) = {p n (s,t;e):s,te Ci}, where 

p(s, t; e) = max{p(s, t), e} and p n (s, t; e) = max{p n (s, t), e}. 

Define now P„, ei , E2 = {p nj£l , e2 (s,t)} by 



7(n, for s G Cj, t S Cj 

and distinct 1 < i, j < N + M, 
Pn(s,t]E2), for s,t G Cj and i € Q, 
p n (s,t;ei), for s, t G Cj and i € f, 



when n > 2; for n = 1, let V\ )£lt£2 be the unit constant matrix, px jEljE2 (s,t) 
I . Form also through CON the measure p, neue2 with respect to initial dis- 
tribution -/r and transition matrices { - Pn,ei,£ 2 }- 

Proposition 4.2. For ei,£2 > 0, i/ie following upper bound holds: 

limsup-logP 7r (Z n er) 

n 

< limsup — log fl 7T)£l)£2 (Z n E T,X n enters each Ci at most once). 
The proof of this proposition is found in Section 5. 

Homogeneous rest cost comparison. Next, we compare measure fi 7T ,e 1 ,£2 
with a measure /2 7rj£lj£2 , which replaces nonhomogeneous transitions within 
sets Cj by limiting homogeneous transition weights. 
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Define, for e 1 ,s 2 > 0, V n ,ei,e 2 

(s,t)} by 
j(n,(i,j)), for seCi,teCj 



and distinct 1 < i, j < N + M, 
p(s,t;E2), for s, t G Ci and i G Q, 
ei, for s,t GCi and i G P, 



P(i,ei,e 2 ) = | 



when n > 2 and Pi, £l , £2 = "Pi, £l , £2 . Let now /i7i-, £l)£2 be formed from CON 
and matrices {V n)£l ^ £2 \ and 7r. 

Proposition 4.3. For £1,62 > 0, we have 

lim sup — log p, w e e2 (Z n sr,X n , enters each Ci at most once) 
n ^ ' 

< lim sup — log p, n £ , £2 (Z n G r,X n enters each Ci at most once). 

n ' ' 

The proof of this proposition is found in Section 7. 

Coarse graining estimate. The next step is to further bound the right- 
hand side in Proposition 4.3 through a detailed decomposition of visit times 
and locations in terms of an e±, ^-perturbed rate Su ,ei,£2- 

Observe for 1 < i < N + M that the submatrix (P ni£lj£2 )oi — P(h^i,^2) 
is independent of n and 

(ei), for i eT>, 

P(i,e 2 ), for i G Q. 

Denote the extended rate function Ii )£1)£2 = ^c i ,f,P(i,e 1 ,£ 2 ) an ^ associated do- 
main of finiteness Qi, £li£2 = Qc 1 j,P{i,ei,e 2 )- I* 1 fact, explicitly when i G V, 

-log(ei), for x = f(rrii), where Cj = 
co, otherwise, 

and \ £u£2 (x) = \ £2 {x) = I Ci ,/,p(i, £2 ) when i G Q. 

Recall now the object C v ,{/ near (2.6), and define for v G Qn + m, x € 
") Ar+M , cr G Sat+m and matrix [7 = j) :1 <«,J <N + M}, the func- 
tion 

AT+M-l / i \ N+M 

C V)f /, £1 , £2 (o-,x) = - ^ u((7(i) ) a(i + l))+ ^ ^(j), E1 , £j (a;i) 

when iV + M > 2 and C ViC /, £l , £2 (cr,x) = Ii, £1;£2 (xi) when N + M = 1. Define 
also, for £ G IR^, 

(4.2) Jj; ( z )= inf ^inf . m in C V)C 7, £1 , £2 (o-,x). 

We comment that when iV = and all P(i) > for i G £/, that JT[/, £1 , £2 = J[/ 
for all £i,£2 small, so the following result already gives the desired upper 
bound. 



(4.1) W 2 (z) 
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Proposition 4.4. For ei,e 2 > 0, we have 



limsup — log/i 7rj£lj£2 (Z n Er,X„ enters each C{ at most once) 



n—foo n 



<-Jab, e i >ea (rnK). 



The proof of the proposition is given in Section 8. 

Limit estimate on Jw 0i£lj£2 . The last step is to analyze Jw 0)£ri|£2 as £i,£2 I 
in the following proposition, which is proved in Section 10. 

Proposition 4.5. We have 



Now, putting together the results above gives Theorem 3.1. 

4.2. Lower bounds: proof of Theorem 3.2. The argument is similar in 
structure to the upper bound. To prove part (i), a reduction is first made 
with respect to initial ergodicity, which can be skipped if one is willing 
to assume that ¥ n satisfies the stronger Condition SIE-1 rather than just 
Condition SIE. Then a surgery of paths estimate, a homogeneous rest cost 
comparison and finally a coarse graining cost estimate are given. Last, having 
proved part (i), the second lower bound part (ii) is argued. 

Let T C M rf be a Borel set. If r° = 0, the bound is trivial. Otherwise, let 
xq G r° and T\ = B(xq, a) C r° be an open ball of radius a > 0. 

SIE estimate. The following estimate shows that under Condition SIE, 
the first few transition kernels do not contribute effectively to lower bounds 
and, in particular, Condition SIE may be replaced with Condition SIE-1. 
When P^ satisfies Condition SIE, let P' n — P n j rni for n > 1, and let nil) — 
P„-(X ni = /) for I € £. Let also P^ be constructed with respect to {P^} and 
distribution n. Clearly, we have no({P^}) = 1 and P^ satisfies Condition SIE- 
1. 

Proposition 4.6. Let T 2 = B(x ,a/2) and suppose F w satisfies Condi- 
tion SIE. Then we have 



lim sup lim sup — 3u t 

£2|0 eij.0 



'0,£l>£2 



(rnK)<-j Wo (r). 



liminf - logP^(Z n € Ti) > liminf - log Pi (Z n G T 2 ). 



Proof. Note that 
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where c\ =ni||/||. Then 

£ Fi) > P,(((n - n!)/n)^ 1+1 e 5(i , a - Cl /n)) 

= Yl ^(O p (m,i)((( n " n i)/ n ) Z n 1+ i G B(x , a - c x /n)) 

= P' n (Z n ^ ni E (n/ (n - ni))B(a;o, a - ci/ra)). 
The proposition now follows by simple calculations. □ 

In view of the last proposition, with regard to the standard lower bound 
methods, we may just as well assume that P^ satisfies Condition SIE-1 if 
Condition SIE already holds. 

Surgery of paths estimate. We underestimate P^ by another measure 
/*7r,£i.e 2 whose connection transitions correspond to T\. Slightly different from 
the surgery for the upper bound, the paths focused on here are those which 
make at most one long visit to sets {Cj : i € Q}, but travel between them in 
short trips through all {Ci : 1 < i < N + M}. 

Let E(N, M) = (M -l)E (N,M) and recall the connecting weight 7 1 (n, 
for distinct i, j € Q [cf. (2.10)]. Define 

which picks the smallest weight in a traveling frame. 
Define also V n = {p n (s,t)} for n > 1 by 

for all seCi,t£ Cj 
and distinct i, j € Q, 
for s,t G Ci and i E Q 
or s £ Ci, t £ Cj when i or j G T>. 

Let jl n be made through CON with {P n } and it. 
In addition, for convenience, let 

G n = {X n enters only {Cj :i &Q} with at most one visit to each set}. 

Proposition 4.7. Let T 3 = i?(x ,a/4) and suppose P^ satisfies Condi- 
tion SIE-1. Then 

lim inf 1 log P,(Z n (/)er 2 )> lim inf - log ^ (Z n (/) G T 3 , G n ) . 

n n 

The proof is given in Section 6. 



T(n,(i,j)), 

Pn(s,t), 



Pn(s,t) 
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Homogeneous rest cost comparision. As before, we compare fi n with a 
measure fi , which replaces nonhomogeneous transitions within sets Cj with 
limiting homogeneous transition weights. 

Define V n = {P n (s,t)} for n > 1 by 

{7°(n, for all s £ Cj, t G Cj- and distinct i,j € £/, 

p(s,t), for s,t€C; and i€G, 

0, otherwise. 

Correspondingly, define \i through CON with {V n } and initial distribution 

7T. 

Proposition 4.8. Suppose n ({P n }) = 1. T/ien we /iaue 

liminf-log/i 7r (Z n € r 3 ,G n ) > lim inf - log p (Z n G r 3 ,G n ). 
n n — n 

The proof is given in Section 7. 

Coarse graining estimate. Again, we bound the right-hand side above 
through a decomposition of visit times and locations. 

Proposition 4.9. Let it be SIE-l-positive. Then 

lim inf- log (Z n ET 3 ,G n ) > -JlnO^)- 
n — w 

The proof is given in Section 9. 

Finally, whereas xq € r° is arbitrary, we have that 

liminf-logP^(Z n £ r°) > — inf (z) 

n-»oo n z&T° 

and so part (i) is proved. 

Proof of Theorem 3.2(h). The following cost bound, proved in Sec- 
tion 11, is the key step. 

Proposition 4.10. We have under Assumptions B or C that Ti>T 
and so Jti < Jr • 

Therefore, given the lower bound in part (i), the second part follows di- 
rectly. 

□ 
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5. Path surgery upper bound. The strategy of Proposition 4.2 is to com- 
pare the probability of a path which moves many times between sets with 
that of a respective rearranged path with fewer sojourns. To make estimates 
we need a few more definitions. 

Let t(n) be the largest entry which connects upward with respect to the 
ordering of the sets {Cj} in the canonical decomposition of P: 

tin) = max t(n,(i,j)). 

l<j<i<N+M V ^ ,J " 

Observe that as movement up the tree is impossible in the limit or, more 
precisely, as T n (i,j) vanishes for l<j<i<N + M, we have t(n) — > as 
n — ► oo. 

Define also for £1,62 > 0, the matrix P nj£l)£2 = {p n>E1>e2 (s,t)} by 

i(n,(i,j)), for seCi,teCj 

and distinct 1 < i, j < N + M, 



J n,ei,e2 



(s,t) 



Pn(s,t;e2), for s,t£Ci and iEQ, 
,p n (s,t;£i), for s, t G Cj and i £ V, 

for n > 1. Form now through CON the measure ^7r i£li£2 with respect to initial 
distribution ir and transition matrices {-P nj£lj£2 }. 

Let also p = min{ei,£2} and observe that p is less than the minimum 
transition probability within subblocks: 

p< min min p n £l e Js, t). 
i<l<N+M s ,teCi ' 61 ' 2V ; 

We now describe a procedure to cut paths into resting and traveling 
parts, which then are rearranged through a rearrangement map. Let x n = 
(xi, . . . ,x n ) € S n be a path of length n > 2. We say that x n possesses a 
"switch" at time 1 < i < n — 1 if x% € Cj and Xj+i € Ck for j ^ k. For a path 
x n which switches I > 1 times, let g%(x n ) be the time of the kth switch, 
where 1 < k < I. Set also go( x n) = and 5;+i(x n ) = n. 

Define now, for 1 < k < I, the path segments between switch times: Jfc(x n ) = 
(^ flfc _ 1 (x„)+i 5 • • • .»g fc (x n )). and the remainder J i+ i(x n ) = (x g;(Xn)+1 , . . .,x n ). 
Define also that Jfe i2 (x n ) = (^ gfe _ 1 ( Xn )+2, • • • , x g k (x n )) when5 fc (x n ) >5fc-i(x n ) + 
2. 

In addition, let Ci k be the subset in which path lies for 1 < k < I + 1 and 
let C; = C/(x n ) = (Cjj , . . . , Ci l+1 ) be the sequence of subsets visited, given in 
the order of visitation. Also, let \\Ci\\ be the number of distinct elements in 
C\. We say x n has no repeat visits if the sequence C\ contains no repetitions. 

For < k < n - 1 and 1 < j < N + M, define the sets 

A n (k) = {x n : x n switches k times} 

and 

A' n (j) = {x n :x n switches j times, with no repeat visits}. 
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When there are at least two sets, N + M > 2, we define the map 

min{7V+M-l,Z} 

ar.Anil)^ |J A' n (j) 
for / > 1, in the following steps. 

1. Let x n 6 A n (Z). Let = 1 + 1 and = /• Inductively define, for 
fc<l|Ci||, 

s k = max {j : Q. £ {c, Sfc+i , C isfc+a , . . . , (2.^ }}■ 

In words, Cj ,...,Cj are the [iCdl distinct subsets visited in reverse 
order starting from the last state of x n . 

2. For 1 < A; < ||Cd|, let J k, ■ ■ ■ , J Q fc , where a\ < ■ ■ ■ < a 5 = s^, be the 

c4 > 1 paths which lie in Cj Sfe - 

3. Define 

a l (x n ) = /j a i,..., J a i , Jj Cl \\,..., Jjw )■ 

1 1 d \\c t \\ 

In words, 07 rearranges the paths that correspond to distinct subsets so 
that the reverse visiting order is preserved. We comment that the last 

path J a i+i is preserved under 07 and that a\ is the identity map. 

d i+i 

Example 1. Suppose N + M = 8 and x n € A n (25), where 
C25 = (Cs, C%i C&, Cj, C5, C7, C6, C5, C6, C4, C2, C4, 

Here, ||C|| = 8, ai = 3, s 2 = 15, s 3 = 18, s 4 = 19, s 5 = 20, s 6 = 24, s 7 = 25 
and ss = 26. Then 

(Cj si > Ci S2 , Cj S3 , Cj S4 , Cj S5 , Q S( . , Ci S7 , Cj sg ) = (Cg , C3 , Ci , C6 , C7 , C5 , C2 , C4) 
and 

C25( x n) = {Jl,J-A, Jl3, Jl5, Jl4, Jl6, JlS, ^2, ^7, ^9, 

Jl9,J<i-, <^20> <^5> <^8, ^21, ^24«^11, -^17^23, <^25> ^10; ^12, <^22, <^26)- 

Finally, we recall at this point useful versions of the "union of events" 
bound. 

Lemma 5.1. Let N > 1 and let {a l n : i,n > 1} be an array of nonnegative 
numbers. We have then 

1 N . 1 
lim sup — log = max lim sup — log a\ 

n-^oo n r-f ' i<i<JV 
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and 

N 



1 1 1 

lim inf — log > at, = lim inf max — log a!. > max lim inf — log aL . 

n _>oo n ^ \<i<N n Ki<N n^oo n 

i=l 

In addition, let a > 1 be an integer and let {{3(n)} be a sequence where 
(3(n) < n a for n > 1. T/ien 

lim sup — log at = lim sup max — log a 1 

n^oo Tl ^ n^oo l<i</3(n) n 

with the same equality when lim inf replaces lim sup. 



See [10], Lemma 1.2.15, for the "limsup" proof. The other statements 
follow similarly. 

Proof of Proposition 4.2. As P n < Pn,ei,e 2 elementwise, we have 

P»(^er)<w(^er). 

Now consider the case N + M = 1 when P corresponds to one irreducible 
set C\ = E. Trivially in this case X n does not leave Ci, so more than one 
switch is impossible. Therefore, the upper bound statement holds immedi- 
ately. 

We now assume that N + M > 2. By Lemma 5.1, 

(5.1) limsup - log F n (Z n G T) < max limsup - log u Xo Ei e2 (Z n G T). 

n n 

^(^0)>° 

Hence, it suffices to focus on v xq ,ei,£2 f° r a given ^GS such that ir(xo) > 0. 

The main idea exploited now is that for a realization X n which switches 
between sets {Ci} many times there will be guaranteed a large number of 
these switches "up the tree" between sets Cj and Cj for i > j whose chance 
is small, and so such paths are unlikely. For notational simplicity, we now 
suppress e\ and 82 subscripts. 

Step 1. Decompose according to the number of switches: 

n-l 

(5.2) v X0 {Z n eV) = Y J v X0 (Zn € T,A n (i)). 

i=0 

Step 2. Let / > 1 and let x n G {Z n G r}n^4 n (Z). Let also y n G a^ 1 (ai(jc n )), 
that is, y n is a path with I switches which rearranges to 07 (x n ). As y n = 



28 



Z. DIETZ AND S. SETHURAMAN 



(Ji(yn), • • • , Ji+i(y n )), where Jfc(y„) is a path in C ik for 1 < k < I + 1, we 
have 

Z/a; (X n =y n ) 
(5.3) =i/ a . (Xf = Ji) 

i 

x n i/ (sfc+i,2/ Sfe +i)( x sS2 = 4+1,2) n i(9*> + 1, (»fc,**+i))> 

l<fc<! fe=l 

where = gk{y n ) and Jfc+1,2 = <4+i,2(yn) (defined above) are shortened for 
clarity. 

We now bound the right-hand side of (5.3) by 

i 

(pi(xo, yi)/l)/*x (X„ = 07 (x n )) Y[ i(9k(yn) +h(h,ik+i)) 

fc=l 

(5.4) 

llcHI-i 

x ft 7- 1 (^(^(Xn)) + l,(^,Wx))-( 1 /p)'" (l|Cilhl) - 
fc=l 

The bound (5.4) is explained by first recalling that in 07 (x n ) there are ||Cj|| — 
1 connections between different sets {Ci}. Equation (5.3) is then multiplied 
and divided by corresponding connection probabilities with respect to pL XQ 
to give the n7 _1 (' • ') term. Second, the prefactor (px(xo, < 1 arises in 

connecting xo to the first state of 07 (x n ) with respect to fi Xo and noting the 
constant form of V\. Third, in forming ai(x n ) from y n , with respect to v XQ , 
I — \\Ci || + 1 connections between different sets are replaced by corresponding 
internal transition probabilities and divided by them. These I — ||Q|| + 1 
divisors are then underestimated by the product of p's. 



Step 3. We now bound further the product terms in (5.4). Consider the 
subproduct 

s r +i— 1 

(5.5) Yi K9k{y n ) + l,(4,4+i)) 

whose factors correspond to transitions between sets in subsequence (Cj Sr , . . . , Ci 1 ) 
for 1 < r < ||Cz|| — 1. Prom this subsequence, we derive a smaller subsequence 
in the following algorithm. 

1. Let (3\ be the smallest index s r + 1 < q < s r+ \ such that C{ q = Cj Sr+1 • 

2. If /3[ > s r + 1, let @2 be the smallest index s r + 1 < q < ftl — 1 such that 
Ci q = Ci l3r _ 1 . Otherwise, stop. 
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3. Continue iteratively: If ft T m > s r + 1, let P^+i be the smallest index s r + 
1 < Q < /?m. ~~ 1 such that Q = Cj r _ . Otherwise, stop. Recalling the 
definition of s r , there are at most ||C;|| — r distinct sets in the sequence 
(Cj s , . . . , Ci a ). The above process finishes in n(r) < \\Ci\\ — r steps to 
find>; (r) =C+l. 

Example 2. With respect to the path x„ in Example 1, we consider 
the algorithm for r = 1. We saw that s\ = 3 and S2 = 15, and 

(C« Sl i Ci si +i! . . . , G\ 2 ) = (Cs,Cr, C^,C-j, Cq, C5, C4, C2, C4, C3, Ci, C3). 

Here, there are n(l) = 4 distinct sets and Pi = s\ + 10 is the smallest index 
so that Ci q = C3. Similarly, f3\ = si + 7 is smallest, where Cj g = C\ i+ 9 = C4. 
Also, /?3 = si + 4 and (3\ = si + 1. 

By construction, the terms 
*(9s r (yn) + l,(v,i/r (r) )), 

%^; (r) (yn) + 1, (j^,^.,)), • • • ,^(yn) + 1, fe»/9[)) 
all appear as factors in (5.5). Also, by monotonicity of t(n, 

n(r)-l 

i(9s r (y n ) + l,(is r ,i^ n{r) )) II ^/3- +1 (yn) + l,(i/3- +1 ,i/3-)) 
(5.6) 

n(r)-l 

<i(# Sr (yn) + i,(v,«/r )) II *(5 f Sr (yn) + n(r)-fc + i,(^ +1 ,i / 3-)). 

k=l 

Also, by construction, the rth switch time between sets C,- and C,- 

? ' L Sr t s r _)_l 

in the rearranged path cr/(x n ) is less than the last time to switch to Q s 
in path y n : 

SV.(o7(x n )) < g ar (y n ). 

So, by monotonicity again, the right-hand side of (5.6) is bounded above 
by 7(<7r(o"z( x n)) + 1) (*s r )*s r+ i))- Also, in particular, it will be convenient to 
note the gross bound, because i(n, < 1 applies to those terms in (5.5) 
not covered by (5.6), that JT^ 1 " 1 *(0fc(yn) + h(h,h+i)) < 7(fr(^( x n)) + 
1, (is r , V+i)) and so 

I ^ [|Cj||-l 

(5.7) Y[i(g k (y n ) + l,(i k ,i k+1 ))< JJ 7(ffifcOK x n)) + 1, 
fc=i fe=i 
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Step 4. We now consider cases when I is large and small. Suppose first 
that I is small, namely I < ||C/||(||C;|| — l)/2 + N + M — 1. Then we have the 
bound, noting (5.3), (5.4) and (5.7), that 

(5.8) ^ (X„ = y n ) < (l/p) l jl xo (X n = ai(x n )). 

Suppose now that / is large, that is, I > ||Cj||(||Q|| - l)/2 + N + M — 1. 
Whereas the chain can only make at most N + M — 1 consecutive downward 
switches (i.e., from sets Cj to Cj for i < j), in q > N + M — 1 switches there 
will be at least \qj (N + M — 1)] upward switches from sets Q to Cj for i > j. 

Whereas n(k) < ||Q|| - fe and so ^1=1^ n ( k ) < HQ 11(11^11 - l)/2, we see 
carefully in Step 3 that we take at most ||Cz||(||Cz|| — l)/2 factors from 

(ifc, ifc+i)) whose product is then dominated by Ill=i' 1 lk( a l ( x n) + 
1, (i afc , i afc+x ))- Hence, remaining in the original product are at least I — 
\\Ci\\(\\Ci\\ — l)/2 uncommitted factors of which at least 

I=L(Z-||C l ||(||C l ||-l)/2)/(JV + M-l)J 

correspond to upward transitions. 

Then, using monotonicity of t(n), we have 

l liaii-i l 

fc=l fe=l 3=1 

Furthermore, noting (5.3) and (5.4), we have, for Z* large, 



(5.9) f*„(Xn=yn)<(l/p) 



n*(i) 

5=1 



Ax ( X n = Cr;(x n )). 



Step 5. We now estimate the size of the set aT (o"z(x n )). Observe that 
the ordering of states within the I + 1 subpaths in 07 (x n ) is preserved among 
the paths af 1 (<r/(x n )) with / switches. Then, to overestimate |<7 z -1 (o7(x n ))|, 
we need only to specify the sequence in which the pairwise distinct sets 
Cj 1 7^ Cj 2 7^ • • ■ 7^ Cj !+1 are visited and how long each visit takes, since once 
the ordering of the sets and switch times are fixed, the arrangement within 
the I + 1 subpaths is determined. 

A simple overcount of this procedure yields that 

krVKxn))i<(™)M< +i . 
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^ (X„ € af 1 (ai(x n ))) 
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(5.10) 



< 



M l+i p- l fi X0 (X n = a l (x n )), 
I 



for I small, 



M p 



i=i 



jl XQ (X n = ai(x n )), for I large. 



Step 6. By Stirling's formula, 



1 



n 



log 



o(l) - - log 
n 



n — / 



■log 



n — / 



n 



With this estimate, we now analyze the factor ( n t )M l Uj =1 £(*) in (5.10). We 
consider cases when I = o(n) and when I < n is otherwise. 

Case 1. When I = l n = o(n), then log(£)/n -» 0. Also, = e°( n ), 

£-Z„ _ e o(n) and nj^^f) = e °(«). 

Case 2. When / = Z n satisfies limsup/„/n > e for some < e < 1, let n' 
be a maximal subsequence. Then (log (V 1 ))/n' = O(l), (logM »')/rt' < 1 + 
logM and (logp «') /n' < 1 + logp -1 , but, as i(n') j and limsupi^/n' > 

e/(iV + M — 1), we have log[ni=i£(*)]/ n ' ~ * — 00 as n ' — * 00 ■ 

Therefore, with respect to a C n = e°( n ) , independent of I > 1 and the path, 
we have from (5.10) that 

V X0 (X-„ 6 ^Vl(Xn))) < C n p, X0 (X n = <Tz(x n )). 



Step 7. Let Z > 1 , and let A n (Z) = UT=i 



min{iV+A/-l,Z} 



A' n (j). Let also A n (Z) 



oi{{Z n G T, A n (/)}) and A n (Z) = {Z n 6 r,^4 n (Z)}. Whereas the average ^ n is 
independent of the order of observations {X\, . . . , X n }, 

A n {l)cA n (l) and {Z n €T,A n (V)} = <rf 1 ai(Z n ET,A n (l)). 

Then we can write 

u xo (z n € r, A.(0) = ^o(^rVK^n e r, A»(0))) 



= z/ E0 ^X n g |J o-, x (x n )^ 

eA n 

< ^ (X n Gaf 1 (x n )) 

n 
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<e°W J] Axo(Xn = X„) 
eA n 

= e°Wfi X0 (z n er,A n (l)). 

Step 8. Whereas U/>i Ai(O l -'Ai(0) C {X n enters each Cj at most once}, 
we have 

n-1 
. i=0 

(5.11) 

< ((1 + in — l)e o( - n ^)/l X0 (Z n € r,X n enters each Cj at most once). 
Then, noting (5.1), (5.2) and (5.11), we have 

limsup-logP 7r (2' n G T) 

n 

< max limsup — log fi Xo (Z n 6 I\X n enters each Cj at most once). 

7r(z )>0 

Applying Lemma 5.1 completes the proof. □ 

6. Path surgery lower bound. The lower bound strategy is informed by 
the upper bound result. Namely, given the rearranged paths focused on in 
the upper bound surgery, we can more or less restrict to them and gain lower 
bounds. 

Proof of Proposition 4.7. When N + M = 1, P is irreducible, C\ = 
£ and T> = 0. Then V n = P n for all n > 1 and so ¥ n {Z n GT) = \i^iZ n £ T)- 
Also, as in the upper bound, X n does not switch in this case. Hence, the 
lower bound holds trivially. 

We now assume that N + M >2. Consider the subset B C S n formed via 
the following procedure. 

1. Tor 1 < m < N + M, let J±, J2, . . . , J m be subpaths that belong, respec- 
tively, to distinct sets , Q 2 , . . . , Cj m , where . . . ,i m } C Q. Let ji = 
\Ji\, Ji = (y\, ■ ■ ■ ,y l ji) and J iy2 = (y l 2 , ■ ■ -,y\) when \ji\ > 2 for 1 < i < m. 
We impose now that the lengths satisfy Y^Liji = n — E(N,M). 

2. When m > 2, we connect subpaths J s and J s+ i for s = 1, .. . , m — 1 as 
follows. Let 0<k<N + M — 2 be the number of sets entered in the con- 
nection and let with i = i s and j = i s +i, Qk an d Vk = {x s, °, . . . , x s,fc+1 } 
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be as near (2.9). Denote w s = (x s '°, . . . ,x s ' fc+1 ) and k s = |w s |. Also, de- 
note b(s) = j s + + ki). Let now w s be such that 

P(6 (s)>y |)(xg^ +1 = (w 8 ,^ 1 )) = 1 1 (b( s ) + 1. '). 

Then, in particular, as ki < E(N,M), we have 

>7°( Eji + !,(»«, *-+i)V 



,i=i 



3. For m > 2, as S^Li &i < E(N,M), the length of the concatenation sat- 
isfies 

L=\(J 1 ,w 1 ,J 2) ...,w m -\j m }\ 

m—l 

n - E(N, M) + k i < n - 

i=l 



When m = 1, the length L = |(Ji)| = n-E(N,M). 

If now L <n, we then augment the last subpath J m by n — L < E(N, M) 
states in Cj m . Specifically, define 

J m , if L = n, 

(J m ,x™, . . . ,x™_ L ), if L < n, 



where (yj^ , x™ , . . . , x™_ L ) is a sequence of n — L + 1 elements in Cj m with pos- 
itive weight. Let also J' m2 = J m ,2 when L = n and J' m 2 = {Jm,2,xTi ■ ■ ■ > 
otherwise. 
Now let 

(JijW 1 , . . . , w m_1 , J' m ), when m > 2, 
( J{), when m = 1. 

Finally, we define -B as the set of all such sequences x n possible. 
Now write 

iFV(z n er 2 ) 

>¥ n (z n er 2 ,x n eB) 

P 7r (X il = J 1 )7(i 1 + l, 2 /j 1 ,yf) 

G{z n Gr 2 }nB 

x p yn-fei+i,j/f)( x A+fci+2 2 = J 2- 2 ) 

(6.1) 
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> c(L)/i 7r (Z n __ B ( J v ) M) € r 2)ri , X n _ E(JV ,M) only enters {Q :z 6 

with at most one visit to each set), 

where 

)( X S+i" L = <*f. • • • > x n- L )), when n > L, 



c(L) = \ r (bM,vTJ^b( m )+i 

{ 1, when n = L, 



and 



2,71 



E(N,M)\\f\\ \ 



B[x 0> 



n — E(N, M) V 2 n / 

In the last step, we rewrote in terms of the measure p, n by collapsing 
together the subpaths {Ji}- At the same time, since the collapsed path 
(Ji, . . . , J m ) is of length n — E(N, M), we correct the set T2 to Y2, n - 

We now estimate the prefactor c(L). With respect to the minimum prob- 
ability Pmin [cf. (2.5)] and n> L large, as P n — ► P, we can certainly bound 

P (6(m),y^)(X 6 j m j +1 = (Xj ,.. . ,X n _ L }) >pj n ; /2. 

Therefore, lim(log c(L))/n = 0. 

Hence, the proposition follows by taking liminf in (6.1) and simple esti- 
mates. 

□ 

7. Homogeneous "rest cost" replacement. We replace certain a priori 
nonhomogeneous "resting" weights with homogeneous ones for both upper 
and lower bound estimates. 

Proofs of Propositions 4.3 and 4.8. The proofs follow as direct 
corollaries of the more general Proposition 7.1 below. □ 

Proposition 7.1. Let {B n } cR d be a sequence of Borel sets. 
Upper bound. For £1,62 > 0, we have 

lim sup - log A7r,£i,£ 2 (X n G B n ) < lim sup - log pw, ei ,e 2 (X n G B n ). 
n n 

Lower bound. Suppose no({P n }) = 1. Then we have 

liminf- log /^(X n G B n ) > liminf -log// (X n G B n ). 
n n — n 
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Proof. We prove the lower bound part, because the upper bound esti- 
mate follows analogously and more simply. Let G = {(s, t) :p(s, t) > where s,t £ 
Ci for i € Q}. As P n — > P, the state space is finite and, by assumption 
no = 1, there exists a > and a sequence a < m(k) f 1 such that m(k) < 
Pk{ s i~k) /p{ s i~t) f° r an ^ C an d > 1. Write now that 



X! E vr(x ) n^( X i"l' 
xoGSx n S_B n i=l 



E E ^( x o) 



Pi(Xj_i,Xj) ^r^-^ i ^-p(xi_i,Xi) 



(xi_i,Xi)eG c 



> E E *(*o) 



p(Xi-l,Xi) — —p.{Xi-l,Xi) 



> 



Y[m(i) 



i=i 



(xi_i,ii)eG c 
E E ^o) 



(xi_i,Xi)GG 



p(Xi_i,Xi) -« 



X II Pi( x i-li x i) II Pj^i-l^i) 
(a;;_i,2;i)eG c (ij_i,s;j)6G 



8=1 



/i (X n €B n ). 



Indeed, for the first bound, we note, if (xj_i,Xj) ^ G, that pj(xj_i,Xj) = 
p .(xj_i,Xj) when (xj_i,Xj) connects distinct sets in Q, and ]5j(xj_i,Xj) > 
= p .{xi-i, Xi) otherwise. The second bound follows by monotonicity of 
{m(i)Y 

Then the proposition lower bound follows as (Xa log m(i))/n — > 0. □ 



8. Upper coarse graining bounds. The plan is to optimize over a coarse 
graining of the possible locations Z n visits in K and associated visit times. 
Some additional definitions which build on those in Section 5 are required 
in this effort. 

Define, for 1 < H < N + M and i H = {ii,... composed of distinct 
indices in {1, . . . , N + M}, that 

C(Ih) = {X n starts in d x and enters successively Cj 2 , . . . , G lH }. 
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Also, let ko = 0, hn = n and, when H > 2, let 1 < k± < • • • < kn-i < n — 1, 
and denote k# = (&o, • • • > and 

5n(k^) = {X n switches at times ki, k 2 , . . . , kji-i}- 

Let also 

Vk ff = (h/n, (k 2 - h)/n, . . . , (n - kn-ij/n). 

We now specify a certain cube decomposition. For v € and 2 € 
recall the set D(H,v,z) [cf. (2.6)] and let D(H,v,B) = \J zeB D(H, v,z) for 
sets Bcl^. 

Let now F\ be the regular partition of K into 2 d closed cubes, {A* : 1 < 
s < 2 d }, whose interiors nonintersect and Us^s = For n > 2, let also 
-F n be the regular refinement of -F n _i into 2 n ~ 1 (2 d ) closed cubes, {A" : 1 < 
s < 2 n - 1 (2 d )}, where also \J S A™ = K. Observe also that the (2 n - 1 (2 d )) H 
subcubes formed from F n , {A(n,s) = A™ X ■ • • X A" H : 1 < s { < 2 n - 1 (2 ci )}, 
refine as well. 

For BcK and j > 1 , define 

Dj (H, v, B) = \J{A(j, s) : A(j, s) n D(H, v, 5) / 0} 

be the nonempty union of all subcubes with respect to jth partition which 
intersect D(H,v,B). Let also 

F(H, n, v, B) = {s : A(n, s) C D n (H, v, J3)}. 

For a > 0, let m a be the first partition level m so that, for each 1 < 
I < N + M, |Ij jEl , £2 (a;) - I|, eij e a (y)\ <a when \x - y \ < diam(A(m, •)) and 

x,y e Qz,ei, E2 - 

We also need the following technical lemmas, which can be skipped on 
first reading. 

Lemma 8.1. For distinct i,j € G, we have 

Uo(hj) = limsup -log 7(71, 
n 

Proof. Write the left-hand side as 

limsup — log7(n, 
n 

k 1 

= limsup max max > — \ogt(n + s, (L. L+i)) 
0<fe<M-2 L fe _ n n 



= max max > lim — logifn, (h,L + -i 
0<fc<M-2 L k n & v ' v ' + 

k 

= n^ a 5 m&xJ2 v (h J s+l) =U (i,j), 
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where the second and third lines follow since the limit limlogi(n, (k, l))/n = 
v(k,l) holds from Lemma 4.1. □ 

In the next result, let 1 < H < N + M and let T C IK be a closed set. Let 
also If = min{Ii |£1>E2 , 9} for 9 > 1 and 1 < / < N + M. 

Lemma 8.2. Let v n G Qh be a convergent sequence, lim n v n = v € £Ih- 
Then, for any i H , we have 

H 

lim sup lim sup lim sup inf u"lf fx,-) 

0too mToo n->oo xSAnfH.v"!)^ J 
H 

> inf y^f,-L. E1 E ,(iA 
" xeD(F,v,r)nK^ J »i.6i.e 3 v 

Proof. Whereas D m (H,v n ,T) C K"^ and is compact, we can find 
a convergent sequence x m ' nfe € D m (H,\r nk ,T) — » x m G so that by lower 
semicontinuity of {if}, 



limsup inf v^I® . (xj) = lim v™ k I 

'3=1 j=l 



rn,n k s 
h K 3 



Now, out of {x m } C K H , let x m ^ -» x G K H be a convergent subsequence 
on which limsup m | 00 Z)|Li Vjlf .(xj 1 ) is attained. Also observe that if(xz) — > 
I; j£lj£2 (a;/) for 1</<A^ + Mas0f oo. Then, again by lower semicontinuity, 

H 

lim sup lim sup lim sup inf 'S^ v™!^ (xj) 
0Too mToo n->oo x€/3 m (H,v»,r) ^ J J 

> lim sup ^ Vjlf^xj) 

6»Too J=1 
H 

= ^ ^j'lij ,ei ,£ 2 i x j ) ■ 

i=i 

To finish the argument, we show that x G D(H, v, T) D K^. By construc- 
tion, the diameters of the partitioning cubes A(m, •) uniformly vanish as 
m f oo. As D m (H,v nk ,T) is composed of cubes which intersect D(H,v nk ,T), 
we have that any point in D m (H,v nk ,T) is at most a distance diam(A(?n, •)) 
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away from D(H, v nfc ,T)nK H . Hence, there are points y m > nk € D(H, v"' , T) n 
K H such that \x m ' Hk - y m '" fc | < diam(A(m, •))■ Let y m -< — > y m € be 
a convergent subsequence. We have then |x m — y m | < diam(A(m, •)). Now 

since T is closed and Ylj=i v™ k y™' nk £ T for all m, k, we have 

H n' ran' H " 

j=i j=i j=i 

and so x £ D(H,v,T) DK H . □ 

Proof of Proposition 4.4. When N + M = 1, there is only one ir- 
reducible subset Ci = £ and Vk,e 1 ,£ 2 = P0-> £ i) f° r k>2. So, modulo a first 
transition (with respect to the constant matrix "Pi), the measure p, n is a 
"homogeneous nonnegative process" with respect to P(l,£i). Also, whereas 
there can no "repeat visits" and Jw , £l , £2 =Ii,ei,e 2 i n this case, the proposi- 
tion follows from the LDP in Proposition 2.1. 

We now assume that N + M > 2. Also, to reduce notation we suppress 
subscripts e± and £2 when there is no confusion in the following text. 

Step 1. Whereas Z n takes only values in the set IK, we have 
fiirei e 2 (Zn € r,X n enters each Cj at most once) 

(8.1) 

= E E^i,e 2 (4ernK,4;(ff-i),C(y), 

1<H<N+M i H 

where the sum on i H is over ( N ~jj M )Hl possibilities. 

Step 2. We first consider the case when "switching" actually occurs. 
Let 2 < H < N + M and fix indices i H . Write, for n > N + M (larger than 
the number of switches), that 

e r n K, A' n (H - i),C(i H )) 

(8.2) 

= J2 Mz n erni, A' n (H - i),c(i H ), s n (k H )), 

where the sum on k# comprises (#~\) possibilities. 
For convenience, denote B = mi and 

E n = A' n (H - 1) n C(i H ) n s n (k H ). 

Let also a > and let m > m a . Recall from part Section 2.4 that Z f £ IK for 
i <j, and so we may write the summand in (8.2) equal to 

^((Z^ , . . . , Z£ H _ 1+1 ) e D(H, v kH , B ) n K H ,E n ) 



(8.3) 



LDP FOR NONHOMOGENEOUS MARKOV CHAINS 39 

< ■ ■ > Z k H - 1+ i) e D m (H,v kH ,B),E n ) 

= ((Zt , • • • , ^fe H _ 1+ i) G (J A(m, s), ^ 

< ^ ^(^i 1 G A™, . . . , e A- , £?„), 

s 

where the union and sum is over s € F(H, m, v\ i[1 , B ) 

Step 3. For 1 < I < N + M, let 717 be the uniform distribution on C\ 
and let Pj,- £l £2 denote the homogeneous nonnegative measure on C\ formed 
from CON with U n = P(l,ei,£2) and initial distribution ir. Let also 9 > 
maxi</<7v + jv/ max^gQj s e Ij j£1j£2 (x) be a number larger than the maxima of 
the rate functions on their domains of finiteness (cf. Proposition 2.1). 

We now use the Markov property (2.2) and simple estimates to further 
bound the summand in (8.3) as 

^eA™..,^ l+1 eA™ 1 £„) 
^(^eA^xJ 1 inQJ 



U) 



x II IcijKfcj + i.Cv.v+i)) 



ff-1 



< n 7(fci+i,(v,ii+i)) n i^ii^s'^c^+ieA^j. 

3=1 j=o J+ 



Step 4. Recall the definition of if just before Lemma 8.2. Let 
c(k j+1 -k f ,A™. +1 ,9,C ij+1 ) 



TOi e a- 1 )ex P ((fc J+1 - «. +1 (a- J). 



From homogeneous nonnegative large deviation upper bounds (cf. Proposi- 
tion 2.1), uniformly over H , k#, the finite number of cubes s at level m, 
and i H , we have c(fc J+ i — kj ; A™ , 0, Ci j+1 ) < . 

Also by monotonicity j(i + 1, . . .) < y(i, . . .). Then we have (8.4) is less 
than 



1.5) 



o{n) 



H-1 

n 7(Mwj+i)) 



expj-^^-^.Ol^A™) 
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Step 5. At this point, we now bound the terms that correspond to no 
"switching" in (8.1), that is, when H = 1. For 1 < i\ < N + M, we have 

(8.6) ^(Z nG rnIC,<(0),C(i 1 ))<e o W J2 _ exp{-<(A™)}. 

s 1 eF{l,n,l,B) 

Step 6. It is convenient now to define 7(0, (1,1')) = 1 for distinct 1 < 
1,1' <N + M. We combine (8.5) and (8.6) to bound (8.1) as 

^{Zn S I\X n enters each set Cj at most once) 



< E EEE 

1<H<N+M i H k H s 



11 



n7(fci-i»(*j-i»»i)) 



[Note that (8.6) corresponds to index H = 1.] 

Since the sum over s £ F(H,m,V] iH , B) contains at most (2 m_1 (2 d ))^ 
terms, we can apply Lemma 5.1 to obtain 

limsup — log fin (Z n 6 r,X n enters each set d at most once) 
n 

< limsup max max max max 
(8.7) 1<H<N+M i H k H s 

Step 7. Now, by the choice of 0, we have if = 1/ on Qi for 1 < I < 
N + M. Also, recall that 1/ is uniformly continuous on Qi for 1 < I < N + M 
(Proposition 2.1). Then, for s € F(H, m, Vk H , B ) such that A(m, s) fl Q n x 
• • • x Qi H 7^ , we have 

1 



1 H 



— > (fc,- — kn-i) inf 



1 * 



xGA(m,s)nn^ =1 Qi i n j=l 
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xeA( m ,s)nnn =1 Q H n j=l 
H 



x€A m.s 77, ' J 



On the other hand, if there exists Gc{l,..., if} such that A™ n Qi - = 
for all j E 67, we have that if. (A™) = inf^A™ Ij,- = Then, combining with 
(8.8), we have 



(8-9) = I E (** " + ± E(% - kj-i%(Al 



1 H 



xGA(m,s) 77 

With the estimate (8.9), we have that (8.7) is less than 

H-l 1 

lim sup max max max — logj(kj, (ij, ij+i)) 
H i H k H . =Q n 

5.10) 

mf _ -Ete-fci-O^O + a. 



E 

xeD m (H, VkH ,B)np[ 



Step 8. Without loss of generality, we may assume that the lim sup 
sequence in (8.10) occurs on a subsequence with fixed 1 < H < N + M, i H 
and vectors k^, where 

Vk™ =v"^v = (jji,...,?jjy) 

and 

lim — log7(fc^, (ij, ij+i)) exists for 1 <j< H. 

n—*oo n *7 J 

Whereas values of 8 and m above a certain range are arbitrary, by Lemma 
8.2 we have 

H 

lim sup lim sup lim inf E w ?^i ( x j) 

6»Too mToo n ~ >00 xgD m (/7:,v™,_B)J^ 3 
H 

^ inf - ^E^'^^i^^i)- 

xeD(H,v,B)nK H j =1 
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Step 9. We now argue that 
Ul) lim-logj(kj,(ij,ij + i)) < ( E^j limsup - log 7(71, (ij, ij+i)). 



Indeed, by definition JjI=i v l = limfc™/ra for 1 < j < N + M. Then, whereas 

1 fc" 1 

-log 7 (A;7,(^,^ + i)) = -^-—^7(^,(^,^+1)), 

inequality (8.11) follows easily when J2\=i t>z > or > limsup (log 7(A)™, (ij,ij + i)))/n > 
—00, but in the exceptional case, (8.11) still holds: Whereas log7(/c! f i , < 
0, we have by the convention • (—00) = that 

lim-log7(A£,(ij,ij+i)) < = 0- (-00) 



n 



) limsup - log j(n,(ij,i j+ i)). 



1=1 



Therefore, we have 



H-l j 

^ lim-log7(fe" ) (ij,i J - + i)) 
j=o n 

<^[H>2] E EM limSU P -1 °g7(«:(*j)«i+l)) 
j=0 V 1=1 / n 

H-l / j \ 

= %>2] X E u ' ) u o(ij,ij+i) 
j=i \ 1=1 ) 

from Lemma 8.1, where the indicator reflects that the right-hand side van- 
ishes when H = 1. So (8.10) is bounded above by 

H-l / j \ H 

V>2] E E^ Pofe,V+l) - inf _ E^Vi.wfa) +« 

< - min min min - 1 [H>2] ^ ( ^ u, J W (^ , ij+i ) 

+ inf _ E u i I %'» £ i. £ 2( x i) + a 
xeD(H,v,B)ni»~^ 

< -J«o,ei,e 2 (-S ) + " = -Jw 0l £i,£2 ( r n K) + a. 
Whereas a is arbitrary, the proposition follows. □ 
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9. Lower coarse graining bounds. As with the lower surgery estimate, 
the plan is to restrict the process to conveniently chosen events to derive 
lower bounds. Recall the notation A' n (l), i H , C(i H ), k# and S n (k#) from 
Sections 5 and 8. Also, for I E Q, let denote the homogeneous nonnegative 
measure on C\ with transition matrix P(l) and initial distribution r\. 

Proof of Proposition 4.9. Whereas tt is SLE-1 positive, let ei G Q 
be such that ir(ei) > for I G Q. Now, when M = 1, Q = {Ci} 5 Jti = \ x and 
on the set G n , the process never leaves C^. In this case, 

!iv {Z n €T 3l G n )>F^ i (Z n €T 3 ) 

and the desired lower bound follows from Proposition 2.1. 
Suppose that M > 2. 

Step 1. Let now i M = . . . ,im), where ij G Q for 1 < j < M, be a 
given ordering of the nondegenerate irreducible sets Q. Let also flj^ = {v € 
Qm '■ v i > for 1 < i < M} be the set of positive measures and let v G f2^. 
Define also «(0) = and = Y^j=i v j f° r 1 < w < M and, in addition, 
for n large enough so that |_™(tOJ < L n ' t; (' u + 1)J f° r 1 < W < M — 1, that 
k n = (|_rau(l)J,..., [nu(M- 1)J>. 

Then, for all large n, 

^(2 n er 3 ,G n ) 

>^(z n er 3 ,<(M-i),c(i M )) 
= J2n 7r (z n er 3 ,A' n (M-i),c(i M ),s n (k M )) 

>^(Z n Gr 3 ,<(M-l),C(i A/ ),5 n (k n )). 

Step 2. Whereas T 3 is open, the set D(M, v,r 3 ) [cf. (2.6)] is also open. 
Then, for x G D(M, v,r 3 ), let e > be so small so that the open cube 
A £ (x) about x with side length e is contained: A e (x) = l\fU A £ (xj) C 
D(M, vm,T 3 ). Also, for simplicity, let 

£ n = A' n (M - 1) n c(i M ) n s n (k n ) 

and 

1 |ra;(tt)J — \ nv(u — 1)1 
a n (it, v) = 

for 1 < u < M. Then (9.1) equals 

/i «a„(l, v)Z 1 Lm,(1)J , . . . , a n (M, v)Zp nu(M _ 1)J+1 ) € D(M, v, r 3 ),£ n ) 
(9.2) ~ 

> ^(Ml, v)4 nt,(1)J , . . . , a n (M, v)Z[ w{M _ 1)i+1 ) G A £ (x), £„)• 



(9.1) 
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Step 3. To make notation easier, we now get rid of the a n (u,v) terms 
at the cost of a further lower bound. Namely, because Zf E K is bounded 
for all 1 < i < u, and a n (u,v) — > 1 for 1 < u < M, we have for all n large 
enough that 

r , 7 \nv{\)\ 7 \nv{2)\ 7 „ v A e/2/ v \T 

' z l™(i)J+2'-- - ' L™(A^-i)J+2/ feZA W) 
C {(a n (l, v)z} ra(1)J , . . . , a n (M, v)Z^ (M _ 1)J+1 ) E A £ (x)}. 

Therefore, dropping the superscript A(x) = A £ / 2 (x), we have for large n 
that 

(9.3) (9.2) > t n ((z[ nv ^ , z{Z%\ +2 , Z? nv{M _ 1)i+2 ) E A(x), £ n ). 

Step 4. We now decompose (9.3) in terms of resting and routing tran- 
sitions. Recall that the transition probability between states x E C\ and 
y E C m at time n with respect to /i^ equals 7°(n + 1, (Z,m)) and does not 
depend on atoms x and y. 

Bound (9.3) below by 

7r ( e ii)Ve h (i Z l ^ ^ > ^L™(l)J+2' " ' ' ^L»M>(Af-l)J+2) G A (x), 

-^[™>(«-i)J = ei u and ^L««(«-i)J+i = e ^+i for 2 < u < M - 1, E n ) 



M-l 



(9.4) = J] 7°(M«)J + -P^^r 1 )J G A(xi),A- LfM , (1)J =e i2 ; 

I 

([ni;(M-l)J+l,e i „)V zy Ln?;(«-l)J- 



X II P (L™(«-l)J+l, ei ,, )(^lni)(u-l)J+2 G A(aj u ),X[„„( u )j ~~ e ^+l, 



X 



u=2 

D'iU 



(L»w(M-l)J+l,ei„)^Ln«(Af-l)J 



(^(A/-1)I+2 GA (^))- 



Step 5. Observe, by definition, for distinct i,j E Q, that 

liminf — log7°(A;, (i, j)) = liminf — log min 7 1 (A; + r, (i, j)) = 7i (i, j). 

k ^ >\>-"> Q< r <E(N,M) ^ ■>■>> 

Then, because large deviations of finite time-homogeneous irreducible chains 
are independent of the first and last observations, we have 

liminf — log (9. 4) 

n 

> M t (ttminf 

U=l x 
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(9.5) x (liminf- - log7°( [nv(u)\ + 1, (i u , i u+1 )) 

V \nv(u)\ + 1 



M 



J2vuh u (A(x u )) 

u=l 

M-l M 
> v (u)%.((i u ,(i u+1 )-J2 V u Ji u( X v>" 



u=l u=l 

Step 6. Whereas v € fi^, x € D(M,v,T) and arrangement ij^ com- 
posed of members in Q are arbitrary, we have from (9.5) that 

(9.6) liminf — log fi (Z n € T3, G n ) > sup max g(v, a), 
where 

M-l M 

s(v,a) = «(«)^l(C<7( u ),C CT (u+l)) " i n / vr ,E^wW' 

We now argue that we can replace with the larger Q,m in (9.6). In 
Lemma 9.1 below we show, for each cr, that <?(•, cr) is lower semicontinuous as 
a function on f2 A/ . In particular, because §m is a finite set, max - e § J/ #(-, cr) 
is lower semicontinuous. Therefore, by taking limits, we improve the bound 
in (9.6) to 

liminf — log fi (-Z n £ l - ^, G7 n ) > sup max g(v, cr), 
which is identified as — inf ze r 3 Jr x (z). □ 

Lemma 9.1. Let B cK d be an open set, and let M > 2 and a £ §m- 
TTien g(-,cr) : Qm — * [0, oo] is lower semicontinuous. 

Proof. Let {v n } C be a sequence which converges, v n — > v. Recall- 
ing our convention • (— oo) = 0, we note that h\ (v) = YL u =l v(u)T\ (Cr(u) > Ct(«+i) ) 
is lower semicontinuous, so we need only to prove /i2(v) = '^ y eD(M,v,B) x Su=i ^w^Ctm 
is upper semicontinuous. 

Let now w 6 D(M,v,B). Because B is open and v n converges to v, we 
must have w € D(M,v n ,B) for all large n. Then, 

M M 

limsup/i 2 (v n ) < limsup^<I CffW (w; u ) = ^ v u l c<y(u) {w u ). 

u=l u=l 
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However, because w € D(M,v,B) is arbitrary, we have in fact that 

M 

lim sup ho ( v n ) < inf / v u Ir , Av u ) = hoM . 

F ^ ' ~ yeD(A/,v,B) ^ U W«)^ ^ ' □ 

10. Limit estimate on Ji^,, ei , e2 . The proof of Proposition 4.5 follows in 
two steps (Propositions 10.1 and 10.2). The first step is to take E\ j and 
estimate in terms of a quantity independent of degenerate transient sets T> 
in Proposition 10.1. The second step is to let £2 j and recover 3u m the 
limit in Proposition 10.2. 

It will be helpful to reduce the expression Jw 0i£lj£2 f° r £ i> £ 2 > [cf. (4.2)]. 
Whereas Ij i£lj£2 i s degenerate around f(i) for i £ V [cf. (4.1)], we can evaluate 
Suo^ueiiB) for B cR d and N + M >2 as 



(- JV+Af-1 

min inf inf < — > Un(a(i), a{% + 1)) 

o-gSjv+MvGO.v+Mxe-D'Cv) [ 



E 1 
3=1 



2 Wilog£i+ ^ ViIa(i),e 2 ( X i) p 



where L>'(v) = {xeD (JV+M,v, 5): Xi = f(a(i) ), for <r(i) eP}. WheniV + 
M = 1, the formula collapses to Jw 0i£lj£2 = Ii,ea- 

We describe now an £2 > "perturbation" of J^ , where we replace rates 
Ij with I ij£2 for i EQ. Define, for Borel 5 C R d and M > 2, that 



Jff 2 (5) = min inf inf 

M ° <tGS m vefi M x£D(M,v,B) 

M-l 

- E ^b(Co-(t):Ctr(i+l)] 
1=1 



M 

3=1 



i=l 



When M = 1, let = Ii, £2 . 

We give now a triangle cost bound useful for the first step. 

Lemma 10.1. For distinct i,j,k G Q, 

Ho(i,j) +Uo(j,k) <W (i, AO- 
Proof. By definition, for some k± and distinct elements L 1 = (Iq = 
iAi---i l lv l k t +i = j) we nave ^o(^i) =Es=o ,J ( | s 1 . | s 1 +i)' Similarly, we have 
for some k 2 and I? = (/§ = j, Zf, ... , Zf 2 , = fc) that W (j, fc) = Esio 
Let now T be the first index of an element in L 1 which belongs to I? . Clearly, 
1 < T < k\ + 1 . Call also T' the index of this element in L? . 
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Form now L 3 = (Zg, l\, . . . , Z|v +1 , ■ ■ ■ , ^ 2+ i)- From construction, L 3 is a 
list of distinct elements which we relabel as L 3 = (Zg, . . . , Z^ ) for some £3. 
Now, since v(a, b) < for all distinct a, 6, we have 

k 3 fci fc2 

s=0 s=0 s=0 

However, 

k k% 

U (i,k)= max max^f;(Z s ,Z s+ i) > ^u(Z 3 ,Z 3 +1 ) 

0<fc<Ai-2 L fe s=Q s=Q 

>U (i,j)+U (j,k). □ 

Proposition 10.1. Lei C IK 6e a compact set and fix £2 > 0. TTiera, 
we have 

limmf3 UotSU£2 (B)>J%(B). 

PROOF. First, when N = 0, we inspect that Jw ,£i,£ 2 (- B ) = ^WoC^)- There- 
fore, we assume that N > 1 in the following procedure. 

Step 1. Let e(k) j 0, v £ ( fc ), x e ( fc ) and a £ ^) be sequences so that the limit 
inferior is attained: 

liminf Su ,e 1 ,e 2 
eiiO 



N+M-i r % 

(10.1) =Hm- Yl U Q (a e{k) {i),o E[k) {i + l)) ^vf k) 
k ^°° i=i lj=i 

e(k) ■ 



Because f2./v+M is compact and §n+m is finite, a further subsequence may 

ei- 



be found so that, with the same labels, v £ ( fc ) — > v and o~ e (k) = c for all small 



Step 2. When J2a(i)ev v i > 0' we have (10.1) diverges to 00, which is 
automatically greater than the right-hand side in the proposition. On the 
other hand, if J2a(i)<=v v i = 0; we m ust have J2a(i)eQ Vi = ^' because v is a 
probability vector. Now, if (10.1) = 00, the proposition bound again holds. 

Suppose therefore that (10.1) is finite. Recall that cube K contains the 
the domains of finiteness of the rate functions {Ij >£2 -i € G} (cf. Proposi- 
tion 2.1). Therefore, by taking a subsequence and relabeling, we can take 
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x e(fc) £ £)'^ v e(k)^ p j£ an( j ensure sequence is convergent, x e ( fc ) — > x. More- 
over, x € D(N + M, v, B) since E 4 =i M ^ e(fe) x- {k) e B converges to <W 
and £> is closed. 

Then, because — Eo-(i)e£> v t l°g £ (^) > an d the rate functions Ij j£2 are 
lower semicontinuous, we have that 



N+M-l 

(10.1) > liminf- V U (a(i),a(i + 1)) 

fc— >oo 

1=1 



Lj=l 



(10.2) 



+ E < ( %W*- (fc) ) 

o-(i)eg 
JV+iW-1 

>- £ U {a(i),a{i + l)) 



E^ 



+ v iMi),S2( x i) 
<r(i)66 



Step 3. When M = 1 and N >1, then £ = {Ci} is a singleton and 
= 1. Moreover, whereas — £/o is nonnegative, (10.2) is bounded below by 
I^ lj£2 (x^ 1 ) > §i( (B) to finish the proof in this case. 

Step 4. Suppose then that M > 2 and JV > 1. The strategy is to form 
a permutation rj £ Sm and vector u G fijvf for which (10.2) reduces to an ex- 
pression that involves only terms that relate to Q. Write a~ l {Q) = {xi, ■ ■ ■ , Xm}, 
where Xi is ordered as follows: 

Xi = min{s : er(s) G Q} and 

Xi = min{s > A"j_i : a(s) £ Q} when 2 < i < M. 

Now, whereas v i = for a(i) ^ Q and, in particular, v.j = for 1 < i < xi — 1 
when xi > 2, we have 



N+M-l 

U Q {a{i),cr{i + l)) 

i=i 

N+M-l 



E^ 



£ W (a(z),a(i + 1)) £ 



«=Xl 



X1 <j<i 



M-lX*+i"l 



k=l i=\k 



Xl<j<i 
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where 



N+M-l 

U (a(i),a(i + 1)) 

i=XM 



E 



0, 



when xm < N + M, 
when xm = N + M. 



In any case, because i^o is nonnegative, we have that 



(10.3) 



N+M-l 
53 WbK»),a(i + l)) 
i=i 



E 1 



M-l 

>-E 

fc=i 



Xfc+l-l 

53 U (a(i),a(i + 1)) 53 

i=Xfe xi<j<i 



Step 5. We now bound individually the terms in large brackets in (10.3). 
For each Xk<i< Xk+i ~ 1, as {vj : xi < j < i and j € = {« Xs : 1 < 

s < k}, we may write 



Xfc+i-l 



53 %(a(i),a(i + l)) 53 



jeo-iCO) 



[WbOKXfcMx* + 1)) + W (a(x fe + l),ff(Xfc + 2)) 
H h^o(<7(Xfc+i - l),o-(Xft+i))] 



< U {a{xk),cr(Xk+i)) E v x s 
s=l 

by repeatedly applying the triangle inequality Lemma 10.1. 
Hence, pulling together the inequalities, we have 



E^ 



Xs 



(10.4) 



AT+Af-1 
53 U (a(i),a(i + 1)) 
i=i 



E^ 

Lj=l 

> - 53 U Q {a{xk)^{Xk+i)) 



M-l 



k=l 



E* 



Step 6. Define now u 6 Om by ^ = v Xk for 1 < k < M. Then 

M M 

53 V$a{i),e 2 (Xi) = Y, V Xk I a( Xk ),e 2 (x Xk ) = E U k l c{Xk)t 
iGcr-^G) fe=l fc=l 



50 



Z. DIETZ AND S. SETHURAMAN 



Now let i] £ S>m be the permutation where Cr?(i) = cr(xi) f° r 1 5; * < 
Noting (10.4), we can then bound (10.2) below by 



M-l 



51 w o(<7(Xfc),o-(Xfc+i)) 



(10.5) 



fc=i 



E 



M 



k=l 



M-l 

' E ^o(Cr,(fc) 1 C»?(fc+l)y 
fc=l 



E 1 

s=l 



+ E^W),«( 2; x fc ) 

M 

E n ^ 1I c, ) ( 



fc=l 



Step 7. By construction M v i x i £ Then, because u,- = when 
<j(j) tf:G, we have 



7V+M 



M M 

E v j x j = E v j x j = E ^'^i = E %s x xs = E ""s^xs 

i=i iecr-i(e) jtv-HQ) s=i 8=1 



and so . . . , x XA/ ) 6 D(M, u, £>). Hence, tracing through the argument, 



M-l 



(10.5) > inf - J2 U o(( v (k),( v (k+i)) 



x£D(M,u,B) 

>&B). 



E 



M 



E^Cw^fc) 
fc=l 



□ 



Proposition 10.2. LetTcR d be compact. Then we have 



(10.6) 



limmf^ (r)>J Wo (r). 



Proof. When liminf £ |o J^ (P) = 00, of course (10.6) is immediate. 

Step 1. Suppose then that liminf e | J^ (P) < 00. As in Step 2 in Propo- 
sition 10.1, let e(k) i 0, a e i k \ = a independent of k, v e( - k ^ — > v and — > 
x G D(M,v,T) be such that 

hminfj^(r) 



M-l / i 



M 



i=l \j=l 

Step 2. We now claim for iEQ that 

e(fc) 



e(fc )(^ (fe) ' 



E ( E-r )«b(CKi).CKm)) + E«r% w 



1=1 



(10.7) 



liminflj , e ( fc )(a;/ ; ) >Ii(xi). 

fc— >00 
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For A € M. d , let Pi, £ (X) and Pi(X) be the Perron-Frobenius eigenvalues that 
correspond to the A tilts of P(i,e) and P(i) [cf. (2.3)]. From [21], we have 
that lim e ; logPi, £ (A) = log ft (A). 

Now, for A' G M. d , observe that 



liminfH i £ ( fe )(a;- 1 - ') = liminf sup (A,xf ') - logp i e(fc) (A) 



F sup (A, a; 
> liminf(A',a;f fe) ) -log/3 ij£(fc) (A') 

k 

= (A , ,x)-logft(A'). 

Hence, because A' is arbitrary, we have \im.ini^ \ e ha{x £ ^) > sup A {(A,x) — 
\og Pi (\)} = Ux). 

Step 3. In fact, (10.7) proves the proposition when M = 1. On the other 
hand, when M > 2, we have with (10.7) that 

M-l / i \ Af 
i=l \j=l / i=l 

>-W r )- D 

11. Routing cost comparisons. We separate the proof of Proposition 4.10 
into two separate results. 

Proposition 11.1. Suppose Assumption B holds. Then, for distinct 
i,j&G(P), 

Proof. Recall the definitions of 7 1 (n, y, z) and 7 1 (n, (i, j)). It is enough 
to prove for y € C, and z G Cj that 

(11.1) liminf i log 7 1 (?i,y,z) > %{i,j). 

n— »oo 7i — 

Then, clearly 

Ti{hj) = liminf- log 7 1 (?i,(i,j)) >T (i,j), 

n — >oo fi — 

finishing the proof. 

We now show (11.1). Let k and L\. = (i = Iq, lk,h+i = j) be such 

that 

k 
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To connect with the definition of 7 1 (n, (i,j)), form vectors x° = • • • , x qo ), ■ ■ ■ , x fc+1 
. . . i^qi"^) with distinct elements in Cj , . . . ,Q fc+1 such that, for < 

s < k, 

x q s = a(! s ,l s +i) and = b n+r{s )(l s , i s+ i), 

1 < 9s < t + 1 and 1 < goj^fe+i < t- In addition, because {-P(i) :i € t/} are 
irreducible, we specify that the paths are possible. Namely, for all large n, 



in _ 1>y) (x.^- l =^)>( Pmin /2y, 

n+r(s-l)+l 



\n+r(s-l),xl)0^ n Xr{J~l)+l ~ X 2) - (Pmin/2) 



and 

F > ( n+r (fc + l)_l i:r fe+ + l i )(^n+r(fc+l) = z ) > Pmin/2 

when q s >2 and 1 < s < fc + 1. Here, x| = (x|, • ■ • ,%q s ) when g s > 2, r(s) = 
E«=o9u and p min is defined in (2.5). 

Since the length of the connecting path from y to z is at most Eq(N, M) 
[cf. near (2.10)], we have 

liminf — log7 1 (?7,, y, z) 

n— >oo fi — 



> lim inf 



log(p m i n /2) WM) 



n 

k+ 

n 



+ — ^2 \°gPn+r{s)( a (h, ls+l),K+r(s)(ls,ls+l)) 
" s=0 



k+1 

J2t(I s ,I s+1 ) =%(i,j) 

s=0 



from Assumption B. □ 



Proposition 11.2. Suppose that Assumption C holds. Then, for dis- 
tinct i,j € Q, 

>%{i,j)- 



Proof. The proof is similar to that of Proposition 11.1. As before, it 
is enough to show (11.1). Let k and Lj. = (i = Iq, li, . . . , lk,lk+i = j) De such 
that To(i,j) = J2s=o T (h,h+i)- Form the path vector x° = . . . , x qo ) with 
1 < Qo < t of distinct elements in d and state x\ G C/ 1 such that 

p n - 1+iqo+1) (x qo ,x\) = t(n + q , (i, h)) 
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and 



lim inf - logP (n _ liS/) (X^^ 1 = x°) 

n— >oo fi y 



1 



liminf-[logp n (y,x?) + logp n (x?, x°) H hlogp„(x° )] =0 



77 



Such a vector x° exists from the primitivity of P* (i) . 

Similarly, form vectors x s = (xf , . . . , x s qs ) in C/ s , where 1 < q s < r + 1 for 
1 < s < k and 1 < qu+i < t- Also specify that 



■' +1 )=t(n-l+r(s) + l,(l s J s+1 )) 
for 1 < s < k. In addition, the paths are chosen so 

V-l+r(s-l)+l,^)( X n+r(s-l)+l =X 2) =0 



Pn— l+r(s)+l l x q s j x l 



1 



lim inf — log I 

n 



and 



1 



lim inf - logP (n+r(fc+1) _ M M + i i) (X„ +r(fc+1) = z) = 
when l s ^iQ and g s > 2, and x| and r(s) are as before. Then 

1 1 

liminf-log7 1 (n,y,z) > lim inf - Y] logt(n - 1 +r(s), (Z s ,Z s+ i)) 

s=0 

fc+i 

> ^2t(I s ,I s+1 ) =%(i,j). 

s=0 



□ 



12. Examples. In this section, we present three examples that concern 
possible LD behaviors of {Z n (f)} under € A(P). The first shows that 
even if Assumption A is violated, an LDP may still hold with respect to 
some processes and functions /. The second example shows that the bounds 
in Theorems 3.1 and 3.2(h) may be achieved. The third example shows that 
it is possible that an LDP is nonexistent under Assumption A when one of 
the submatrices {P(i) -i (zQ} is periodic and Assumptions B and C do not 
hold. 

12.1. Assumption A is not necessary for LDP. The point is that if the 
connecting transition probabilities oscillate so that Assumption A fails, but 
not too wildly, then the process on the large deviation scale can wait an o(n) 
time to select optimal connections. Let T, = {0, 1} and initial distribution 
tt = (1/2,1/2). Let also / :£->R be given by /(0) = 1 and /(l) =0, and for 
k > 1, define transition matrices 



A k 



1 



(i) fc (i) fc 



and Bi. 



1 



;i) fc (i) fc 
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Then, for n > 1, let 



Pn = 




for n even, 
for n odd. 



The limit matrix P is the 2x2 identity matrix I2, with two irreducible sets, 
Co = {0} and C\ = {1}. Both sets correspond to degenerate rate functions, 
for i = 0, 1, Ij(.x) = for x = 1 — i and = 00 otherwise. Also, one sees that 
r(0, 1) = — log 3 < — log 2 = v (0, 1), so Assumption A is not satisfied here. Of 
course, r(l,0) = u(l,0) = —00. Also, the process satisfies Condition SIE-1. 

To identify the large deviations of {Z n (/)} under ¥\ , we focus on sets 
T = (a,b] for < a < b < 1, because the analysis on other types of sets is 
similar. 

As before, ^4(0) and A(l) are the events that X n does not switch and 
switches exactly once between sets Co and C\. Since T is such that F w (Z n £ 
T,A(0)) = and also since the chain cannot switch from state 1 to 0, we 
have 



p w (z n 6 r) = K(z n e r,A(i)) = p^Zn g r, A(l), X l = 0,X n = 1). 



The event {A(l),Xi = 0,X n = 1} C S n consists exactly of n — 1 paths 
x nj j that start at but switch to 1 at time 1 < i < n — 1. Now compute that 



where a(k) = 1/2 for k even and = 1/3 for k odd. Also, on the path x n> j, we 
have that Z n = i/n. 

Let G° = {l<i<n:i/n€ r°}. Then, by Lemma 5.1, we have 

lim inf - log F w (Z n G T°, A(1),X 1 = 0,X n = 1) 
n 

= lim inf max — log P^ (X n = x n ) 

ieG° n 



Similarly, limsup(l/n) logP 7r (Z n G I\A(l),Xi =0,X n = 1) = -alog(2). 

A related analysis works for more general V and we have that {Z n (f)} 
satisfies an LDP with rate function 



P?r(X n — X. 



,) = n(0)l[(l-a(k) k )( a ( i + l)y +1 J] (l-a(Z)') 




= e o(n) (o(i + l))* 






ZG[0,1), 
z = l, 
otherwise. 
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12.2. Bounds may be sharp in Theorems 3.1 and 3.2. The key in this ex- 
ample is that the connection probabilities oscillate "unboundedly," so pick- 
ing out the optimal strategy is time-dependent. As before, let S = {0,1}, 
vr = (1/2,1/2) and let / : E — > R be given by /(0) = 1 and /(l) = 0. Let 
{g(n)} be a fast divergent sequence of integers, g(n) | oo, g(n) < g(n + 1) 
and g(n — l)/g(n) — > 0. Also, for k > 1, let 

(h, for l<i< 5 (2), 
Pi=(A h for g(2k)<i<g{2k + l), 

[Bi, for g(2k + l) <i<g(2k + 2), 

where Ai and Bi are defined in Section 12.1. 

To compute the large deviations of {Z n (f)}, we focus now on sets T = 
(a, b) C [0, 1], where < a < b < 1. Calculations for other sets are analogous. 
Then, in the notation of the previous example, 

liminf-logP 7r (Z n G T) = liminf -logP,r(Z n £T,A(1),X 1 =0,X n = 1). 
n n 

Let now = g(2k + 2) for > 1. Then i/n^ 6 T exactly when |~g(2&; + 
2)o] < i < Lff(2A; + 2)6J . Also, whereas 

r fg(2fc + 2)a] . g(2fc + l) 
hm — — = a > = hm — — \- , 

« g(2k + 2) g(2k + 2) : 

we have for all large k that #(2£; + 1) + 1 < \g{2k + 2)a] < [g(2k + 2)b\ < 
g(2k + 2). Note also that P { = B { for g(2k + 1) + 1 < i < g(2k + 2). Hence, 

lim — log ¥ n (Z nk e r, A(1),X 1 = 0, X nk = 1) 

= lim inf max — log F n (X n = x nfc j ) 
i: i/«fcGr rtfc 

= lim g (2n + 2) lo <3j = - alQg(3) - 

Moreover, in fact liminf(l/n) logP^Zn 6T) = — alog(3). 

Similarly, by considering subsequence = g(2k + 1), we get 

limsup-logP^fZn £T,A(1),X 1 = 0,X n = 1) = -alog(2). 

n 

These calculations, and analogous ideas give, for any T, that 

lim sup — log U^(Z n € r ) = — inf J(z) 
n zer 



and 



lim inf — log ixA Z n € r°) = — inf S(z), 
n zGr°- 
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where 



2 log 2, for z 6 [0,1), 
l(z) = {0, z = l, 

oo, otherwise, 



and 



2 log 3, for z € [0, 1), 
l(z) = {0, z = l, 

oo, otherwise. 

On the other hand, these lower and upper rate functions match those in 
Theorems 3.1 and 3.2(i). Whereas T (0,1) = - log 3, W (0,1) = - log 2 and 
t(k, (1, 0)) = for all k > 1, we have 

I To (z) = - inf inf min{I (y),(51og(3)+5Io(x) + (l-5)I 1 ( 2 /)} 

<5e[0,l] <i,i/)e-D(2,{(5,l-(5),2) 

= l(z) 

and analogously = X 



12.3. Periodicity and nonexistence of LDP. We consider a process which 
satisfies Assumptions A but not Assumptions B or C for which an LDP 
cannot hold through an explicit contradiction. Also, we show that the lower 
bound with respect to 7~o in Theorem 3.2 does not work for this example. 

Let E = {1, . . . , 9} and let tt be the uniform distribution on S. For n of 



P n = 



= 1 + 3j for j > 0, 


except when 


n = 


3 2 ' + 1 for j > 5, let 


[1/3 


1/3 


1/3 























1/3 


1/3 


1/3 





t(n 


,(1,2)) 












1/3 


1/3 


1/3 



































1 










i(n,(2,3)) 



















1 












1 











1 













t(">(2,3)) 
























1/3 


1/3 1/3 


























1/3 


1/3 1/3 


























1/3 


1/3 1/3 






ri/3 


1/3 


1/3 





















" 


1/3 


1/3 


1/3 
























1/3 


1/3 


1/3 


t(n + 1,(1,2)) 

































1 





t(n+ 1,(2,3)) 



























1 


t(n+l,(2,3)) 



















1 









































1/3 


1/3 1/3 
























1/3 


1/3 1/3 
























1/3 


1/3 1/3. 
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ri/s 


1/3 


1/3 





t(n + 2,(l,2)) 














1/3 


1/3 


1/3 




















1/3 


1/3 


1/3 
































1 





























1 





t(n + 2,(2,3)) 














1 











t(n + 2,(2,3)) 























1/3 


1/3 


1/3 




















1/3 


1/3 


1/3 




















1/3 


1/3 


1/3 



For n = 3 + 1 for j > 5, let and P n +2 be defined as before, but now 

let 



ri/3 


1/3 


1/3 




















1/3 


1/3 


1/3 








t(n, (1,2)) 











1/3 


1/3 


1/3 
































1 











t(n, (2,3)) 

















1 








t(n, (2,3)) 











1 



































1/3 


1/3 


1/3 




















1/3 


1/3 


1/3 




















1/3 


1/3 


1/3 J 



Suppose now that t(n, (1,2)), t(n, (2,3)) and t(n, (2,3)) vanish as n tends 
to infinity and limits 

lim — logi(ra(l, 2)), \im — \ogt(n, (2,3)) and lim — logt(n, (2, 3)) 
n n n 

exist and equal, respectively, 

v(l, 2) = r(l, 2) = 0, v(2, 3) = r(2, 3) = A 

and 

lim - logUn, (2, 3)) = 2A + e, 
n 

where A < and e > is chosen small enough so that 2 A + e < A. 

Define the diagonal matrix A n = diagjA^ 1 , . . . , Ag 1 }, where A, is the ith 
row sum of P n . Then limA n = Ig. Let P n = A n P n for n > 1. The limit 
matrix P = limP n = \\m.P n corresponds to three sets: C\ = {1,2,3}, C2 = 
{4,5,6} and C 3 = {7,8,9}. 

Let also / be a one-dimensional function on the state space such that 
/(l) = /(2) = /(3) = 1, /(4) = /(5) = /(6) = 2 and / (7) = /(8) = /(9) = 3. 
We now concentrate the sequence {Z n (f)} with respect to the process . 



Assumptions. By inspection, it is clear that Condition SIE-1 and As- 
sumption A hold, but Assumptions B and C do not hold. 
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Nonexistence of LDP. First, let be the measure constructed from 
{P n } and 7r through CON. It is not difficult to see that the large deviation 
of Z n under F w is the same as with respect to //„■, that is, for Borel r C K d , 

lim sup — log ¥ n (Z n £ r) = lim sup — log fi w (Z n G T ) 

n n 



and 



1 1 
liminf — logP 7r (Z n G r°) = liminf — logn n (Z n G r°) 
n n 



(cf. Proposition 7.1). Second, the rate functions on the three sets are degen- 
erate: 

Hz) = { 0, if ,f = i '- far* = 1,2,3. 
[oo, otherwise, 

Consider now the following two lemmas, which are proved later. 

Lemma 12.1. For < e < 1/2, let V = [2 + e, 2 + 2e]. TTien 

limsup-logitTrfZn G r) > (1 - 2e)A. 
n 

Lemma 12.2. For0<e<l/2 and0>O, let T(9) = (2 + e-0, 2 + 2e + 9). 
Then 

liminf liminf - log aJZ n G T(0)) < (1 - 2e)A 

These results show that no LDP is possible. If an LDP were to hold with 
rate function /, say, then 

(1 - 2e)A > liminf liminf -log uJZ n G T(0)) 

> liminf— inf I(x) > — inf I(x) 

Bio x&r(8) xer 

> limsupilog^ 7r (Z n G T) > (1 — 2e)A, 

n— >oo Tl 

leading to a contradiction. 

Lower bound in Theorem 3.2(h) does not hold. Consider the following 
lemma proved at this end of this section. 

Lemma 12.3. With respect to T(9) as in Lemma 12.2, we have 

- inf J % ( z ) = {l-2e-e)j-. 
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Then a clear contradiction with Lemma 12.2 would arise if the lower 
bound in Theorem 3.2(h) were valid. 

Proof of Lemma 12.1. No ra-word x n that remains solely in a single 
closed set can have an average in T. Also, by construction, no ra-word can 
pass from Cj to Cj for i > j, or in one step from C\ to C3. Therefore, the 
only n- words such that (1/n) Ya=i f( x %) £ [2 + e, 2 + 2e] are those which visit 
succesively C±, C2 and C3 or those which visit first C2 and then C3. 

We now examine (1/n) log fjL w (Z n e V) along the sequence 



32 fc / ( 1 - 2e 



for k > 1. Let now A(rtfe) be the set of n^-words x nfe which stays in C\ until 
time 3 , spends one time unit in C2 and then switches to C3. By definition, 
for x nfe € ^4(nfc) and A; large enough, we have 



1 nfe 3 2fc 2 / 3 2fc 4- 1 \ 

1 -J2f(x i ) = — + -+(l-^±)3e[2 + e,2 + 2e]. 
nk ~{ n k n k \ n k J 



Then, with 5(e) = (1 - 2e)/2, we have 
limsup-log//£(Z n er) 

n— >oo 77. 

>liminf — log^(^n fc er,%)) 

k— >oo rafc 

> - inf 

(x,i/)e£>(2,(5(6),l-5(6)),r) 

- 5(e) ( liminf - \ogt(k, (1,2)) + liminf - logi(Jfe, (2, 3)) 
I k k 

+ 5(e)h(x) + (1 - 5(e))I 3 (y) 

= 5(e) j lim i log*(fc, (1, 2)) + lim ^ logt(fc, (2, 3)) 

-<5(e)I 1 (l)-(l-«5( e ))I 3 (3) 
l-2e 



(2A + £) > (l-2e)A □ 

Proof of Lemma 12.2. Let now ra fc = 3 2 for fe > 1. We first show that 
x nfe cannot visit C\, C2 and C3 in succession and satisfy zrJ27=if( x i) € 
r(0) for all small 0. Indeed, by construction, a path x nfe which visits C\,Ci 
and C3 must switch from C2 to C3 at a time less than or equal to 3 2k 1 . 
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However, then, because /(•) > 1 and 3 2k /nk —> 0, we have for large k and 
6 sufficiently small that 

-£/(*;)>- +(1-- ±±)3>2 + 2e + e. 

Thus, if x nfc G r(#), we deduce x nfc begins in C 2 and then switches to C3. 
Now let r(e) = 1 - 2s. We have 

lim inf lim inf — log P n (Z n G T(0)) 

< lim inf Hm inf — ]ogPJZ ni €T(9)) 

610 k^oo n k k 

= lim inf sup sup 51imsup — logt(fc, (2, 3)) 

HO 0<S<l(x,y)eD(2,(S,l-S),T{6)) k 

- 5l 2 (x) - (1 - 5)Hy) 

= t(s) lim sup - logtffc, (2, 3)) = (1 - 2s) A, 
k 

because r(e) is the smallest 5 such that (2, 3) G D(2, {5, 1 — <5), [2 — e, 2 + 2e]). 

□ 

Proof of Lemma 12.3. Since motion is possible only from C\ to C 2 
to C3, and the corresponding rate functions are degenerate at x± = 1, x 2 = 2 
and X3 = 3, we have 

3 

Jr o (r(0))= sup sup v 1 T(l,2) + (v 1 + v 2 )T(2,3)-y2v i I i (x i ) 

v 1 +v 2 +v 3 =l xeD(3,v,r(0)) j =1 
0<Vx,V2,V3<\ 

= sup (v 1 +v 2 )A 

vi+2«2+3(l— vi-v 2 )eT(6) 
0<vi ,V2<1 

= {l-2e-9){A/2). □ 
APPENDIX 

A.l. Proof of Lemma 4.1. We consider separately the situations when 
0<5<1 and 6 = 0. 

Case 5 > 0. Let t n — sup s > n t s . Then t n < i n , < t n < 1 and t n J, 5. Also 

lim — log t n — > = lim sup — log t n . 

n n 

Case 5 = 0. The proof is split into two subcases. 
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Subcase 1. limsup(l/n) logi n = t < 0. If t n vanishes eventually, that is, 
t n = for n> Nq, some Ao > 1, then we may take 

1, forl<n<iVo, 



,2 



for n > ATq. 



Otherwise, let a n = sup J>n (l/j) logtj and i n = exp{sup ;>n la{\. Note a n j. 
t and t n > exp{na n } > exp{n(l/n) logi n } = t n , and also that 1 > t n > 0. In 
addition, (1/n) logt n >a n —>t. 

Let now 1 > e > and let N\ be such that a n < (1 — e)i for n > Aq. Then 

— login < — sup/t(l — e) = t(l — e) 

n n i> n 

for n > Aq. Whereas e is arbitrary, we then have (1/n) logi n — > t. 

Subcase 2. limsup(l/n) logt n = t = 0. As t n — > 0, we have t n < 1 for 
n > A r 2, say. Let bj = max7v 2 <z<j(l/0 logt; for j > N 2 and let 

{1, for n < N 2 , 

exp< sup jbj >, for n > N 2 . 
lj>n ) 

Note that t n < t n and 1 > t n > 0, and as sup J>n j&j decreases with n, that 
i n is a decreasing sequence. 

We now identify the limit. Note that bj < for all j > A^ and (1//) log ti — > 
0. Then, for each K > N 2 , there is an index Jk > ^ such that 

bj = max (1/Z) logt; for j > Jk- 

K<l<j 

Hence, for large n and given K > A^, 

i n = exp< supj max (1//) logi; > < exp< sup max logfy > = sup max ti. 

lj>n K<l<j ' J lj>n K < l <j J j>n K < l <3 

Whereas K is arbitrary, we have that t n [ 0. 

Finally, as 6j — > 0, we have for e > and large n that 

> (1/n) logi n = (1/n) supj&j > (1/n) supj(-e) = -e. 

Whereas e is arbitrary, we have (1/n) logi n — > 0. □ 

A. 2. An extended Gartner-Ellis theorem. We give here a minor exten- 
sion of the Gartner-Ellis theorem and state some general conditions under 
which a sequence of bounded nonnegative measures {/i n } on M. d satisfies an 
LDP. 
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For A € R d , define the extended real sequence A n (A) = log e^ A,x ^ d/j, n (x) 
and also A(A) = lim n _ >00 (l/n)A n (nA), provided the extended limit exists. We 
now recall when A is essential smoothness (cf. [10]). 

Assumption E. 

1. For all A 6 R d , A(A) exists as an extended real number in (—00, 00]. 

2. Let D\ = {A : -00 < A(A) < 00}. Suppose G D° K . 

3. The function A(-) is differentiable throughout D°^. 

4. When {A n } C D\ converges to a boundary point of D\, we have | VA(A n ) | — > 00. 

5. The function A(A) is a lower semicontinuous function. 

We now state the standard Gartner-Ellis theorem (cf. [10]). 

Proposition A.l. Let be a sequence of probability measures which 
satisfy Assumption E. Let I be the Legendre transform of A. Then I is a rate 
function and {u n } satisfies LDP (2.1). 

The main result of this section is the following proposition. 

Proposition A. 2. Let {/J- n } be a sequence of bounded nonnegative mea- 
sures on W d that satisfy Assumption E. Let I be the Legendre transform of 
A. Then I is an extended rate function and the LDP (2.1) holds. Moreover, 
I can be decomposed as the difference of a rate function of a probability 
sequence and a constant, I = I 1 — A(0). 

Proof. By Assumption E, with A = 0, we have that (1/ra) log /i n (lR d ) — > 
A(0) € R. Consider now the probability measures u n (-) = /i n (-)/^ n (R rf ). The 
pressure of the sequence {v n } is calculated as A(-) — A(0). Since Assump- 
tion E holds for A(-), it also holds for the shifted function A(-) — A(0). 
Therefore, by Proposition A.l, we have that {f n } satisfies (2.1) with rate 
function I 1 given by 

I 1 (x)=su P {(A,x)-(A(A)-A(0))} 

A 

= sup{(A,x) -A(A)}+A(0). 

A 

Let now I(x) = sup A {(A,x) - A(A)}, so that I = I 1 — A(0). Whereas //„(■) = 
fi n (R. d )v n (-) , by translating we obtain that (2.1) holds for the {/i n } sequence 
with rate function I. □ 

A. 3. Proof of Proposition 2.1. 
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Extended pressure A. We follow the method in [10] to identify the ex- 
tended pressure of the sequence {p n }'- 

A(A) = lim-logA n (nA) 
n 

= lim - log f exp((A, ]T /(X;)}) dV n 

= lim-log(7r t (n C)A ) n l). 
n 

Since He x is an irreducible matrix, the Perron-Frobenius eigenvalue p(C, A) 
possesses a right Perron-Frobenius eigenvector v(A) with positive entries. 
Let a and b be the smallest and largest entries. Then 



log(7r*(n c7)A )"l)<log((l/a)7r*(n C)A fv) = ilogf-7r*v N ) + logp(C,A) 

n \a J 

and, similarly, log(Tr t (Ilc,\) n l) > log p(C, A) + o(l). Hence, 



1 



A( A) = lim - log A n (nA) = log p(C,X). 
n 



Analyticity, convexity and essential smoothness of A. Perron-Frobenius 
theory guarantees that p(X) has multiplicity 1 and is positive for all A € M. d . 
Then, by Theorem 7.7.1 in [21], p(-) is analytic and so A(-) is analytic. Now, 
because A(A) is the limit of a sequence of convex functions, it is convex. 
Finally, by the comments of Section 3.1 in [10], we have that A is essentially 
smooth. 

I is an extended rate function and {p n } satisfies an LDP. Recall now 
that I = Ic is the Legendre transform I(x) = sup AeR d (A, x) — A(A) . By Propo- 
sition A. 2, we have that I is an extended rate function and {p n } satisfies 
an LDP with respect to I. 

I is a rate function when D~c is sub stochastic. When Uc is substochastic, 
we have A(0) < 0. Hence, by Proposition A. 2, I = I 1 — A(0) > and so is a 
rate function. 

I is not identically 00. Let x = VA(0). Then, by Theorem 23.5 in [25], 

I(x) = sup (A, VA(0)} - A(A) = (0, VA(0)) - A(0) = -A(0) < 00. 

AeR d 
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Convexity of I and strict convexity on the relative interior ofQc- Whereas 
A is convex, the Legendre transform I is convex. Also, because A(-) is real- 
valued and lower semicontinuous, by Lemma 4.5.8 of [10], A is the conjugate 
of I. Whereas I is not identically oo, it is a proper convex function. More- 
over, since I is lower semicontinuous, it is a closed convex function as well 
(cf. [25], page 52). Then, since A is essentially smooth, we have from Theo- 
rem 26.3 of [25] that I is strictly convex on the relative interior of its domain 
of finiteness Qc- 

Qc is convex and Qc C K. Let x,y G Qc- The convexity of I implies 
that I((x + y)/2) £ (I(x) + %))/2 < oo. Hence, Q c is convex. 
For A 6 M. d , let A = (|Ai|, . . . , |A d |). Then 

exp^-A, (max\f(i)\jld)Pc < n c ,A < exp^A, (max\f(i)\j l d \p c . 
Whereas the Perron-Frobenius value of Pq is 1, we have 

exp^-A, fmax|/(«')Ml d \ < p(A) <exp^A, fmax|/(i)Ml d 

Now let x be such that xj > maxj |/(i)| for some 1 < j < d. Then, for 

a£l, let A J ' a G R d be such that \{> a = for j and \f a = a. We have 
then 



I(x) > sup (A, x) — ( A, ( max \ f, t 



> (X j ' a ,x) - (x>> a , (m^\fi\\l d \ > axj - |a|max|/j|. 

By taking a t oo, we have that I(x) = oo. Similarly, if Xj < — maxj \fi\, then 
I(x) = oo. Thus, I(x) < oo implies maxj \xi\ < maxj |/j| and so Qc C K. 

Qc is compact. If I can be shown to be uniformly bounded on Qc, then 
the lower semicontinuity of I will imply that Qc is closed. Also, since it was 
shown above that Qc is bounded, Qc will then be compact. 

Let p be the smallest positive entry in Pc and let G = {x : I(x) < — logp}. 
By the lower semicontinuity of I, G is a closed set. Let xq G G c . We show 
that I(xq) = oo and hence Qc C G. 

Since G c is open, let B = B(xo;5) C G c be a closed ball around xo with 
some radius 5 > 0. If now limsup(l/n) log /j, n (Z n G -B) > — oo, then there 
exists a sequence {x n J such that Yn=l f( x i)/ n k G -B and p n (X n = x nfc ) > 0. 
However, we have /i n (X n = x nfc ) >p nk , and so limsup(l/nfc) log/x n (Z nfe G 
-B) > logp. Hence, using the LD upper bound, 

— 1(B) > limsup — log [i n (Z n G B) > logp. 
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However, since I is lower semicontinuous, 1(B) =I(xi) on some point x\ in 
the compact set B C G c . Hence, 1(B) > — logp, giving a contradiction. 
Therefore, we must have I(xo) — 00 because 

— 00 = limsup — log [i n (Z n € B) 
n 

> liminf-log^ n (Z n e B) > -I(B°) > -I(x ). 
n 

I is uniformly continuous on Qc- Whereas I is convex, I restricted to 
Qc is continuous. Since Qc is compact, I is in fact uniformly continuous on 

Qc- 

1 is a good rate function. Whereas I is lower semicontinuous, the level 
set {x : l(x) < a} for a € R is a closed subset of Qc and hence compact. 

A. 4. Proof of Proposition 4.1. When M = 1, P(Ci) is stochastic and 
J;/ = 1^ , and the proof follows from Proposition 2.1. Suppose now that M > 
2. Consider that Su < min{II,;:i 6 G} and so Q^ v D (ji^gQi is nonempty. 
Also, Qj v C K: Indeed, for z ^ IK, and v G £Im and x € D(M, v, z) we must 
have that v j > and Xj ^ K. for some 1 < i < M. Then C Vt u(o~,x) = 00 and 
so Jjj(z) = o°. 

In addition, Su is lower semicontinuous and nonnegative because {Ij} that 
correspond to substochastic matrices {P(i) - i &Q} are rate functions with 
compact domains of finiteness. Finally, Su is a good rate function from the 
same argument given for Proposition 2.1. 
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